From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 2002:a6b:9f0d:: with SMTP id i13-v6mr1060028ioe.28.1530723307248; Wed, 04 Jul 2018 09:55:07 -0700 (PDT) X-Received: by 2002:aca:c744:: with SMTP id x65-v6mr537859oif.2.1530723307042; Wed, 04 Jul 2018 09:55:07 -0700 (PDT) Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!border1.nntp.ams1.giganews.com!nntp.giganews.com!newsreader5.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.am4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!d7-v6no1850790itj.0!news-out.google.com!l67-v6ni1826itl.0!nntp.google.com!d7-v6no1850788itj.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Wed, 4 Jul 2018 09:55:06 -0700 (PDT) In-Reply-To: <5de5f768-40bf-4518-a647-22788658de74@googlegroups.com> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=47.185.195.62; posting-account=zwxLlwoAAAChLBU7oraRzNDnqQYkYbpo NNTP-Posting-Host: 47.185.195.62 References: <70c11a71-3832-4f57-8127-f3f1c48a052f@googlegroups.com> <62e38ee4-f72f-4ed8-bef1-952040fb7f8d@googlegroups.com> <64d8b4a1-a92c-4b90-b95c-e821749de969@googlegroups.com> <887212304.552080112.848502.laguest-archeia.com@nntp.aioe.org> <87muvan83x.fsf@adaheads.home> <1449870001.552246132.581310.laguest-archeia.com@nntp.aioe.org> <5de5f768-40bf-4518-a647-22788658de74@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <64454862-b293-4ed7-9c3e-c8a1252344db@googlegroups.com> Subject: Re: Strange crash on custom iterator From: "Dan'l Miller" Injection-Date: Wed, 04 Jul 2018 16:55:07 +0000 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Received-Bytes: 5037 X-Received-Body-CRC: 1619715074 Xref: reader02.eternal-september.org comp.lang.ada:53601 Date: 2018-07-04T09:55:06-07:00 List-Id: On Wednesday, July 4, 2018 at 10:41:49 AM UTC-5, Lucretia wrote: > On Wednesday, 4 July 2018 15:37:40 UTC+1, Dan'l Miller wrote: >=20 > > The difficulty is that =E2=80=A2no one=E2=80=A2 has the single =E2=80= =A2solution=E2=80=A2 for this problem or these concomitant > > problems. Not even J-P. Rosen is a possessor of complete solution in h= is Wide_Wide_String > > recommendation, because his replies seem to factually-incorrectly imply= that there exists a fully > > normalized single-codepoint character in Unicode/ISO10646 for each grap= heme/letter. >=20 > JP Rosen told me to go read the AI on the matter, which I did. He states = they talked about it, there's not > much talking in the AI at all! Bob Dewar states they shouldn't really abu= se the *String types by subtyping > and does exactly that by introducing a package he wrote to handle UTF usi= ng those subtypes. The rest > of the AI is about how to fit that into the standard. >=20 > Back then, they should've chosen the Unicode standard over the ISO10646 a= s it's freely available, yes > the encodings are interchangeable, but that's not really the point.=20 1) As a fellow ISO standard (ISO8652), Ada is compelled by ISO rules to com= ply with ISO standards (instead of other standards bodies) when an ISO stan= dard exists for that topic. 2) In the end, what difference to Ada would actually occur by the ARG consi= dering Unicode the normative reference instead of ISO10646 the normative re= ference. The Unicode-specific extensions are higher in the food chain (e.g= ., bidirectional algorithms) than Ada's libraries (or language) have ever b= itten off to chew. > They should've decided to obsolete the current mess, the same way they di= d with ASCII and made String > and Unbounded_String UTF-8 encoded. They could still have the old latin b= ased strings as compatibility > types. They should've made all source be encoded the same way, which they= did anyway for the iso > spec. >=20 > Then defined a bunch of iterators for the types based on code points, gra= pheme clusters, word/line > boundaries, bidi, etc. Yes, parsing/decoding iterators over UTF-8 and UTF-16 would be awesome. Wh= ere-is-the-next-fully-formed-grapheme iterators would be awesome for UTF-32= and UCS4 to make processing of combining characters (both in never-single-= codepoint graphemes and in not-normalized-but-could-have-been multi-codepoi= nt sequences) would be awesome. But then again, why bother waiting decade = or two for the standard library? Ada could have a Boost-esque library outs= ide of the ISO8652 standard, where, say, Luke & Dmitry contribute such a be= tter solution.