From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!aioe.org!.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: Community Input for the Maintenance and Revision of the Ada Programming Language Date: Thu, 31 Aug 2017 18:08:17 +0200 Organization: Aioe.org NNTP Server Message-ID: References: <79e06550-67d7-45b3-88f8-b7b3980ecb20@googlegroups.com> <9d4bc8aa-cc44-4c30-8385-af0d29d49b36@googlegroups.com> <1395655516.524005222.638450.laguest-archeia.com@nntp.aioe.org> <4527d955-a6fe-4782-beea-e59c3bb69f21@googlegroups.com> <22c5d2f4-6b96-4474-936c-024fdbed6ac7@googlegroups.com> <1919594098.524164165.354468.laguest-archeia.com@nntp.aioe.org> <85d4930c-d4dc-4e4f-af7a-fd7c213b8290@googlegroups.com> <725b229b-f768-4603-b564-4751e5e7136f@googlegroups.com> <87ziag9ois.fsf@jacob-sparre.dk> <87val3aoly.fsf@jacob-sparre.dk> <87pobbakxr.fsf@jacob-sparre.dk> NNTP-Posting-Host: MajGvm9MbNtGBKE7r8NgYA.user.gioia.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Complaints-To: abuse@aioe.org User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 X-Notice: Filtered by postfilter v. 0.8.2 Content-Language: en-US Xref: news.eternal-september.org comp.lang.ada:47863 Date: 2017-08-31T18:08:17+02:00 List-Id: On 2017-08-31 17:45, Simon Wright wrote: > "Dmitry A. Kazakov" writes: > >> On 31/08/2017 16:09, Jacob Sparre Andersen wrote: > >>> As I see it, there is nothing wrong with reading a sequence of octets >>> containing an UTF-8 encoded string, mapping it to the internal >>> encoding, and *then* parse the text. >> >> UTF-8 *is* the internal encoding. It is the best representation for >> most cases. > > But see the thread beginning at [1], and specifically [2], for the > effect of different normalization forms .. > > [1] > https://groups.google.com/d/msg/comp.lang.ada/ZhDARPQ8deQ/fubEjsggBAAJ > > [2] > https://groups.google.com/d/msg/comp.lang.ada/ZhDARPQ8deQ/6v-c9SmNAQAJ I would not blame UNICODE alone. There are lot of characters same and different at the same time. E.g. а /= a (the first one is Cyrillic). Apart from advice never ever normalize anything, the problem arise from an attempt to attach some meaning to the string. It is a slippery slope [*]. It is also an argument against conversions Jacob hopes to be solution. -------------------------------------------- * The problem is that equivalent names under an OS are not necessarily ones invariant under given conversion. We cannot resolve it at the language level. It is hopeless. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de