From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00
	autolearn=unavailable autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!aioe.org!.POSTED!not-for-mail
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Newsgroups: comp.lang.ada
Subject: Re: Community Input for the Maintenance and Revision of the Ada
 Programming Language
Date: Thu, 31 Aug 2017 15:16:11 +0200
Organization: Aioe.org NNTP Server
Message-ID: <oo926q$1rj7$1@gioia.aioe.org>
References: <oludae$seh$1@franka.jacob-sparre.dk>
 <79e06550-67d7-45b3-88f8-b7b3980ecb20@googlegroups.com>
 <9d4bc8aa-cc44-4c30-8385-af0d29d49b36@googlegroups.com>
 <1395655516.524005222.638450.laguest-archeia.com@nntp.aioe.org>
 <omgrf6$5nf$1@dont-email.me>
 <4527d955-a6fe-4782-beea-e59c3bb69f21@googlegroups.com>
 <22c5d2f4-6b96-4474-936c-024fdbed6ac7@googlegroups.com>
 <aab7a027-f0e8-4278-bc5c-9492ca4ccefe@googlegroups.com>
 <1919594098.524164165.354468.laguest-archeia.com@nntp.aioe.org>
 <85d4930c-d4dc-4e4f-af7a-fd7c213b8290@googlegroups.com>
 <oml2tl$11i5$1@gioia.aioe.org> <oml6m6$qgq$1@franka.jacob-sparre.dk>
 <725b229b-f768-4603-b564-4751e5e7136f@googlegroups.com>
 <87ziag9ois.fsf@jacob-sparre.dk> <oo8m80$17c1$1@gioia.aioe.org>
 <87val3aoly.fsf@jacob-sparre.dk>
NNTP-Posting-Host: vZYCW951TbFitc4GdEwQJg.user.gioia.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.3.0
Content-Language: en-US
X-Notice: Filtered by postfilter v. 0.8.2
Xref: news.eternal-september.org comp.lang.ada:47855
Date: 2017-08-31T15:16:11+02:00
List-Id: <comp.lang.ada>

On 31/08/2017 14:49, Jacob Sparre Andersen wrote:
> Dmitry A. Kazakov wrote:
> 
>> You need a view of a string as an array of code points / unicode
>> characters *and* another view as an array of encoding items,
>> e.g. octet for UTF-8 or word for UTF-16 etc.
> 
> But the encoding stuff is (mostly) on the out-side of the application.

Not really. E.g. parsing is done in octets for obvious reasons. That was 
the reason why UTF-8 was designed this way.

> I don't mind having routines for mapping to and from various encodings,
> but the encoded types should not have character or string literals, they
> should just be arrays of octets with certain characteristics.

I don't understand this. What is the use of a string type without 
literals? Unbounded_String is a perfect example why this does not work.

It is all about having an ability to choose a representation (encoding) 
rather than getting it enforced upon you by the language. It is no 
solution if you simply create yet another type with the required 
representation losing the original type's interface and forced to 
convert forth and back between two types all over the place. This mess 
does not deserve consideration. We have it aplenty: String, Wide_String, 
Unbounded_String, char_array, chars_ptr ad infinitum.

> Don't worry so much about the encoding-view.  Push the encoding troubles
> to the edge of your application, and work in a consistent form inside
> the application.

Many, if not most, applications never care about code points. 
Applications deal with substrings starting and ending at the boundary of 
a code point. For them it is no matter if the substring is a chain of 
code points or a chain of encoding items.

>> You cannot handle this in present Ada.
> 
> You can, if you harmonize to a single encoding for the character and
> string view, and only see specific encodings as serializations of
> (subsets of) the general character and string types.
> 
> The places I expect to see trouble is if some source text assumes that
> Standard.Character and Interfaces.C.char are the the same.

It should assume Standard.Octet and Interfaces.C.char same. You simply 
cannot drop either encoding items or code points. It would never work.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de