From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 2002:a6b:3c0e:: with SMTP id k14-v6mr870481iob.105.1530715404689; Wed, 04 Jul 2018 07:43:24 -0700 (PDT) X-Received: by 2002:aca:75c9:: with SMTP id q192-v6mr459187oic.3.1530715404529; Wed, 04 Jul 2018 07:43:24 -0700 (PDT) Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!news.uzoreto.com!weretis.net!feeder6.news.weretis.net!feeder.usenetexpress.com!feeder-in1.iad1.usenetexpress.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!u78-v6no1739383itb.0!news-out.google.com!l67-v6ni1673itl.0!nntp.google.com!d7-v6no1744116itj.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Wed, 4 Jul 2018 07:43:24 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=47.185.195.62; posting-account=zwxLlwoAAAChLBU7oraRzNDnqQYkYbpo NNTP-Posting-Host: 47.185.195.62 References: <70c11a71-3832-4f57-8127-f3f1c48a052f@googlegroups.com> <62e38ee4-f72f-4ed8-bef1-952040fb7f8d@googlegroups.com> <64d8b4a1-a92c-4b90-b95c-e821749de969@googlegroups.com> <887212304.552080112.848502.laguest-archeia.com@nntp.aioe.org> <87muvan83x.fsf@adaheads.home> <1449870001.552246132.581310.laguest-archeia.com@nntp.aioe.org> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <5611f9a5-508b-4846-9d53-4a05599f7f53@googlegroups.com> Subject: Re: Strange crash on custom iterator From: "Dan'l Miller" Injection-Date: Wed, 04 Jul 2018 14:43:24 +0000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Xref: reader02.eternal-september.org comp.lang.ada:53591 Date: 2018-07-04T07:43:24-07:00 List-Id: On Wednesday, July 4, 2018 at 9:37:40 AM UTC-5, Dan'l Miller wrote: > On Wednesday, July 4, 2018 at 8:27:53 AM UTC-5, Dmitry A. Kazakov wrote: > > On 2018-07-04 13:30, J-P. Rosen wrote: > > > Le 04/07/2018 =C3=A0 12:01, Dmitry A. Kazakov a =C3=A9crit=C2=A0: > > >> But UTF-8 is actually more efficient in most cases than > > >> Wide_Wide_String. Random string indexing is practically never used. > > > !!!! I, and many others, often need to search substrings within a > > > string; actually, I would have a hard time finding an example of stri= ng > > > manipulation without indexing... > > >=20 > > >>> We discussed that point, and the agreement was that making a differ= ent > > >>> type would force the user to many conversions that would bring noth= ing > > >>> but trouble, and make Ada once again look impractical out of excess= ive > > >>> purism. > > >> > > >> Exactly my point. Explicit conversion are necessary because Ada's ty= pe > > >> system is unable to model strings in a type-safe way. > > > So, you want different types, plus a typing system that would allow t= o > > > mix the types and make them compatible. > >=20 > > Yes, because they are semantically same: arrays of code points. > >=20 > > > .. You might as well put > > > everything in the same type! > >=20 > > No, because they must have different representations. > >=20 > > > Anyway, the ARG has to deal with Ada as it is, not as Dmitry dreams i= t > > > should be... > >=20 > > It requires someone more influential, wise and knowledgeable than me to= =20 > > make and then push such a proposal. I would be satisfied if more people= =20 > > saw the roots of problems with strings etc. >=20 > I think that perhaps /all/ readers of this see at least one =E2=80=A2prob= lem=E2=80=A2 with UTF-8 (and perhaps Unicode/ISO10646 in general in Ada, re= gardless of choice of encoding) in Ada's String (and perhaps Wide_String an= d Wide_Wide_String too). >=20 > The difficulty is that =E2=80=A2no one=E2=80=A2 has the single =E2=80=A2s= olution=E2=80=A2 for this problem or these concomitant problems. Not even = J-P. Rosen is a possessor of complete solution in his Wide_Wide_String reco= mmendation, because his replies seem to factually-incorrectly imply that th= ere exists a fully-normalized single-codepoint character in Unicode/ISO1064= 6 for each grapheme/letter. The following article provides 7 examples in 4= languages (2 of which are European languages, no less!) where a single gra= pheme's most-compact representation in Unicode/ISO10646 is a multi-codepoin= t sequence. >=20 > The absolutely most infamous of these 7 examples is the Lithuanian one. = Because through flukes of sociopolitical history, Vietnamese, French, Germa= n, and so forth all had pre-1992 ISO standards or IBM-Microsoft-Apple code-= pages for their letters with diacritics, their languages' letters with diac= ritics got standardized in Unicode/ISO10646 as single codepoints, e.g., =C3= =BC as U+FC instead of =C2=A8 U+308 followed by u U+75. Poor old Lithuania= was under Soviet occupation from 1944 to 1991, during which the Soviets tr= ied to suppress the Lithuanian language. Due to this suppression, the Sovi= et character-encoding standards never standardized encodings for Lithuanian= letters with all the Lithuanian-specific diacritical marks, such as the 2 = example letters given in the article linked above. Because the timespan wa= s so short from the Soviet occupation leaving Lithuania in 1991 to the 1992= cut-off of pre-existing character-encoding standards to which Unicode/ISO1= 0646 must be encode as single codepoints, poor old Lithuanian characters ar= e 2nd-class citizens in Unicode/ISO10646, whereas all the Western European = languages (and their former colonies) with diacritical marks are first-clas= s citizens in Unicode/ISO10646. This is a cause of somewhat of a protracte= d slow-motion multidecade trench warfare between Lithuania and Unicode/ISO1= 0646 over this issue, made worse every time someone elsewhere on the planet= whips up a brand-new character-with-single-codepoint that has never ever e= xisted in the history of humankind and then standardizes this brand-new con= trived grapheme-with-single-codepoint in Unicode/ISO10646. >=20 > Oh, but Japan and Silicon Valley can devise emojis galore in recent years= and not be restricted by strict enforcement of this no-preexisting-charact= er-encoding rule. Why? I guess because emojis are cool, but Lithuanian ch= aracters are booooorrrrrrrring. Oh, it would help if I would press the paste key: http://unicode.org/standard/where