From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 2002:a24:9195:: with SMTP id i143-v6mr1005142ite.52.1530715059436; Wed, 04 Jul 2018 07:37:39 -0700 (PDT) X-Received: by 2002:aca:4787:: with SMTP id u129-v6mr464396oia.4.1530715059197; Wed, 04 Jul 2018 07:37:39 -0700 (PDT) Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!news.uzoreto.com!weretis.net!feeder6.news.weretis.net!feeder.usenetexpress.com!feeder-in1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!u78-v6no1735083itb.0!news-out.google.com!l67-v6ni1673itl.0!nntp.google.com!d7-v6no1739408itj.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Wed, 4 Jul 2018 07:37:38 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=47.185.195.62; posting-account=zwxLlwoAAAChLBU7oraRzNDnqQYkYbpo NNTP-Posting-Host: 47.185.195.62 References: <70c11a71-3832-4f57-8127-f3f1c48a052f@googlegroups.com> <62e38ee4-f72f-4ed8-bef1-952040fb7f8d@googlegroups.com> <64d8b4a1-a92c-4b90-b95c-e821749de969@googlegroups.com> <887212304.552080112.848502.laguest-archeia.com@nntp.aioe.org> <87muvan83x.fsf@adaheads.home> <1449870001.552246132.581310.laguest-archeia.com@nntp.aioe.org> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: Re: Strange crash on custom iterator From: "Dan'l Miller" Injection-Date: Wed, 04 Jul 2018 14:37:39 +0000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Xref: reader02.eternal-september.org comp.lang.ada:53590 Date: 2018-07-04T07:37:38-07:00 List-Id: On Wednesday, July 4, 2018 at 8:27:53 AM UTC-5, Dmitry A. Kazakov wrote: > On 2018-07-04 13:30, J-P. Rosen wrote: > > Le 04/07/2018 =C3=A0 12:01, Dmitry A. Kazakov a =C3=A9crit=C2=A0: > >> But UTF-8 is actually more efficient in most cases than > >> Wide_Wide_String. Random string indexing is practically never used. > > !!!! I, and many others, often need to search substrings within a > > string; actually, I would have a hard time finding an example of string > > manipulation without indexing... > >=20 > >>> We discussed that point, and the agreement was that making a differen= t > >>> type would force the user to many conversions that would bring nothin= g > >>> but trouble, and make Ada once again look impractical out of excessiv= e > >>> purism. > >> > >> Exactly my point. Explicit conversion are necessary because Ada's type > >> system is unable to model strings in a type-safe way. > > So, you want different types, plus a typing system that would allow to > > mix the types and make them compatible. >=20 > Yes, because they are semantically same: arrays of code points. >=20 > > .. You might as well put > > everything in the same type! >=20 > No, because they must have different representations. >=20 > > Anyway, the ARG has to deal with Ada as it is, not as Dmitry dreams it > > should be... >=20 > It requires someone more influential, wise and knowledgeable than me to= =20 > make and then push such a proposal. I would be satisfied if more people= =20 > saw the roots of problems with strings etc. I think that perhaps /all/ readers of this see at least one =E2=80=A2proble= m=E2=80=A2 with UTF-8 (and perhaps Unicode/ISO10646 in general in Ada, rega= rdless of choice of encoding) in Ada's String (and perhaps Wide_String and = Wide_Wide_String too). The difficulty is that =E2=80=A2no one=E2=80=A2 has the single =E2=80=A2sol= ution=E2=80=A2 for this problem or these concomitant problems. Not even J-= P. Rosen is a possessor of complete solution in his Wide_Wide_String recomm= endation, because his replies seem to factually-incorrectly imply that ther= e exists a fully-normalized single-codepoint character in Unicode/ISO10646 = for each grapheme/letter. The following article provides 7 examples in 4 l= anguages (2 of which are European languages, no less!) where a single graph= eme's most-compact representation in Unicode/ISO10646 is a multi-codepoint = sequence. The absolutely most infamous of these 7 examples is the Lithuanian one. Be= cause through flukes of sociopolitical history, Vietnamese, French, German,= and so forth all had pre-1992 ISO standards or IBM-Microsoft-Apple code-pa= ges for their letters with diacritics, their languages' letters with diacri= tics got standardized in Unicode/ISO10646 as single codepoints, e.g., =C3= =BC as U+FC instead of =C2=A8 U+308 followed by u U+75. Poor old Lithuania= was under Soviet occupation from 1944 to 1991, during which the Soviets tr= ied to suppress the Lithuanian language. Due to this suppression, the Sovi= et character-encoding standards never standardized encodings for Lithuanian= letters with all the Lithuanian-specific diacritical marks, such as the 2 = example letters given in the article linked above. Because the timespan wa= s so short from the Soviet occupation leaving Lithuania in 1991 to the 1992= cut-off of pre-existing character-encoding standards to which Unicode/ISO1= 0646 must be encode as single codepoints, poor old Lithuanian characters ar= e 2nd-class citizens in Unicode/ISO10646, whereas all the Western European = languages (and their former colonies) with diacritical marks are first-clas= s citizens in Unicode/ISO10646. This is a cause of somewhat of a protracte= d slow-motion multidecade trench warfare between Lithuania and Unicode/ISO1= 0646 over this issue, made worse every time someone elsewhere on the planet= whips up a brand-new character-with-single-codepoint that has never ever e= xisted in the history of humankind and then standardizes this brand-new con= trived grapheme-with-single-codepoint in Unicode/ISO10646. Oh, but Japan and Silicon Valley can devise emojis galore in recent years a= nd not be restricted by strict enforcement of this no-preexisting-character= -encoding rule. Why? I guess because emojis are cool, but Lithuanian char= acters are booooorrrrrrrring.