From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=unavailable autolearn_force=no version=3.4.4
X-Received: by 10.50.61.132 with SMTP id p4mr7662261igr.13.1447358827499;
        Thu, 12 Nov 2015 12:07:07 -0800 (PST)
X-Received: by 10.182.246.66 with SMTP id xu2mr169384obc.18.1447358827479;
 Thu, 12 Nov 2015 12:07:07 -0800 (PST)
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!mx02.eternal-september.org!feeder.eternal-september.org!usenet.blueworldhosting.com!feeder01.blueworldhosting.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!i2no2293043igv.0!news-out.google.com!l1ni2149igd.0!nntp.google.com!i2no967542igv.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Thu, 12 Nov 2015 12:07:07 -0800 (PST)
In-Reply-To: <n22qac$6np$1@loke.gir.dk>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=174.28.149.7;
 posting-account=lJ3JNwoAAAAQfH3VV9vttJLkThaxtTfC
NNTP-Posting-Host: 174.28.149.7
References: <00aab01c-7d18-408a-9a4c-feb80ac9a1e1@googlegroups.com>
 <n22qac$6np$1@loke.gir.dk>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <fdb68ece-f102-481c-af22-6999d29be7a1@googlegroups.com>
Subject: Re: Unicode string comparision functions
From: Shark8 <onewingedshark@gmail.com>
Injection-Date: Thu, 12 Nov 2015 20:07:07 +0000
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Xref: news.eternal-september.org comp.lang.ada:28335
Date: 2015-11-12T12:07:07-08:00
List-Id: <comp.lang.ada>

On Thursday, November 12, 2015 at 12:46:22 PM UTC-7, Randy Brukardt wrote:
> "Shark8" wrote in message=20
>=20
> >I thought I had come across a unicode Equals_Case_Insensitive
> >(and less than) for unicode using Wide_Wide_Strings some time
> >ago, but I cannot seem to find them again; am I misremembering,
> >or were they in a really odd place?
>=20
> Not an odd place, but they have their own subclause (A.4.10).

Thank you for the ref.

>=20
> >For this particular application I would rather use Wide_Wide_String than
> > Wide_String so I wouldn't have to worry about invalid character=20
> > [sequences]
> > for the non-ASCII characters. (And, while UTF-8 encoded strings have th=
e
> > nice property of being endian agnostic, they still have that property.)=
 -- =20
> > But I
> > suppose the main thing is to have a good case insensitive compare such=
=20
> > that
> > PRUSSIAN and Pru=DFian are considered equal.
>=20
> Sorry, the language-defined equality won't do that. It uses=20
> "locale-independent simple case folding", which means that strings of=20
> different lengths are always different. (That's the same case comparison=
=20
> that's used for Ada identifiers.)
>=20
> The much more complex "locale-independent full case folding" is not provi=
ded=20
> by the language, we didn't want to inflict that level of pain on Ada=20
> implementers (especially as the need was unclear).

I can see why, and certainly don't begrudge that decision -- unicode is, IM=
O, a terrible 'solution' to the problem of multiple languages.

I thought I read something in the rationale that implied the full case fold=
ing was to be used, at least with respect identifiers in Ada's own source-c=
ode... and so mistakenly thought the Equal_Case_Insensitive would do so (af=
ter all, if the compiler itself requires that functionality there's little =
reason not to provide access to it).

>=20
> The AARM note A.4.10(3.a/3) gives a bit of background.

I'll have to read that.

Thank you.