From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 10.50.61.132 with SMTP id p4mr7662261igr.13.1447358827499; Thu, 12 Nov 2015 12:07:07 -0800 (PST) X-Received: by 10.182.246.66 with SMTP id xu2mr169384obc.18.1447358827479; Thu, 12 Nov 2015 12:07:07 -0800 (PST) Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!mx02.eternal-september.org!feeder.eternal-september.org!usenet.blueworldhosting.com!feeder01.blueworldhosting.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!i2no2293043igv.0!news-out.google.com!l1ni2149igd.0!nntp.google.com!i2no967542igv.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Thu, 12 Nov 2015 12:07:07 -0800 (PST) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=174.28.149.7; posting-account=lJ3JNwoAAAAQfH3VV9vttJLkThaxtTfC NNTP-Posting-Host: 174.28.149.7 References: <00aab01c-7d18-408a-9a4c-feb80ac9a1e1@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: Subject: Re: Unicode string comparision functions From: Shark8 Injection-Date: Thu, 12 Nov 2015 20:07:07 +0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Xref: news.eternal-september.org comp.lang.ada:28335 Date: 2015-11-12T12:07:07-08:00 List-Id: On Thursday, November 12, 2015 at 12:46:22 PM UTC-7, Randy Brukardt wrote: > "Shark8" wrote in message=20 >=20 > >I thought I had come across a unicode Equals_Case_Insensitive > >(and less than) for unicode using Wide_Wide_Strings some time > >ago, but I cannot seem to find them again; am I misremembering, > >or were they in a really odd place? >=20 > Not an odd place, but they have their own subclause (A.4.10). Thank you for the ref. >=20 > >For this particular application I would rather use Wide_Wide_String than > > Wide_String so I wouldn't have to worry about invalid character=20 > > [sequences] > > for the non-ASCII characters. (And, while UTF-8 encoded strings have th= e > > nice property of being endian agnostic, they still have that property.)= -- =20 > > But I > > suppose the main thing is to have a good case insensitive compare such= =20 > > that > > PRUSSIAN and Pru=DFian are considered equal. >=20 > Sorry, the language-defined equality won't do that. It uses=20 > "locale-independent simple case folding", which means that strings of=20 > different lengths are always different. (That's the same case comparison= =20 > that's used for Ada identifiers.) >=20 > The much more complex "locale-independent full case folding" is not provi= ded=20 > by the language, we didn't want to inflict that level of pain on Ada=20 > implementers (especially as the need was unclear). I can see why, and certainly don't begrudge that decision -- unicode is, IM= O, a terrible 'solution' to the problem of multiple languages. I thought I read something in the rationale that implied the full case fold= ing was to be used, at least with respect identifiers in Ada's own source-c= ode... and so mistakenly thought the Equal_Case_Insensitive would do so (af= ter all, if the compiler itself requires that functionality there's little = reason not to provide access to it). >=20 > The AARM note A.4.10(3.a/3) gives a bit of background. I'll have to read that. Thank you.