From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!aioe.org!.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: Community Input for the Maintenance and Revision of the Ada Programming Language Date: Thu, 31 Aug 2017 15:16:11 +0200 Organization: Aioe.org NNTP Server Message-ID: References: <79e06550-67d7-45b3-88f8-b7b3980ecb20@googlegroups.com> <9d4bc8aa-cc44-4c30-8385-af0d29d49b36@googlegroups.com> <1395655516.524005222.638450.laguest-archeia.com@nntp.aioe.org> <4527d955-a6fe-4782-beea-e59c3bb69f21@googlegroups.com> <22c5d2f4-6b96-4474-936c-024fdbed6ac7@googlegroups.com> <1919594098.524164165.354468.laguest-archeia.com@nntp.aioe.org> <85d4930c-d4dc-4e4f-af7a-fd7c213b8290@googlegroups.com> <725b229b-f768-4603-b564-4751e5e7136f@googlegroups.com> <87ziag9ois.fsf@jacob-sparre.dk> <87val3aoly.fsf@jacob-sparre.dk> NNTP-Posting-Host: vZYCW951TbFitc4GdEwQJg.user.gioia.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: abuse@aioe.org User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 Content-Language: en-US X-Notice: Filtered by postfilter v. 0.8.2 Xref: news.eternal-september.org comp.lang.ada:47855 Date: 2017-08-31T15:16:11+02:00 List-Id: On 31/08/2017 14:49, Jacob Sparre Andersen wrote: > Dmitry A. Kazakov wrote: > >> You need a view of a string as an array of code points / unicode >> characters *and* another view as an array of encoding items, >> e.g. octet for UTF-8 or word for UTF-16 etc. > > But the encoding stuff is (mostly) on the out-side of the application. Not really. E.g. parsing is done in octets for obvious reasons. That was the reason why UTF-8 was designed this way. > I don't mind having routines for mapping to and from various encodings, > but the encoded types should not have character or string literals, they > should just be arrays of octets with certain characteristics. I don't understand this. What is the use of a string type without literals? Unbounded_String is a perfect example why this does not work. It is all about having an ability to choose a representation (encoding) rather than getting it enforced upon you by the language. It is no solution if you simply create yet another type with the required representation losing the original type's interface and forced to convert forth and back between two types all over the place. This mess does not deserve consideration. We have it aplenty: String, Wide_String, Unbounded_String, char_array, chars_ptr ad infinitum. > Don't worry so much about the encoding-view. Push the encoding troubles > to the edge of your application, and work in a consistent form inside > the application. Many, if not most, applications never care about code points. Applications deal with substrings starting and ending at the boundary of a code point. For them it is no matter if the substring is a chain of code points or a chain of encoding items. >> You cannot handle this in present Ada. > > You can, if you harmonize to a single encoding for the character and > string view, and only see specific encodings as serializations of > (subsets of) the general character and string types. > > The places I expect to see trouble is if some source text assumes that > Standard.Character and Interfaces.C.char are the the same. It should assume Standard.Octet and Interfaces.C.char same. You simply cannot drop either encoding items or code points. It would never work. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de