comp.lang.ada
 help / color / mirror / Atom feed
* Unicode string comparision functions
@ 2015-11-12  4:06 Shark8
  2015-11-12  5:04 ` Jeffrey R. Carter
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Shark8 @ 2015-11-12  4:06 UTC (permalink / raw)


I thought I had come across a unicode Equals_Case_Insensitive (and less than) for unicode using Wide_Wide_Strings some time ago, but I cannot seem to find them again; am I misremembering, or were they in a really odd place?

For this particular application I would rather use Wide_Wide_String than Wide_String so I wouldn't have to worry about invalid character [sequences] for the non-ASCII characters. (And, while UTF-8 encoded strings have the nice property of being endian agnostic, they still have that property.) -- But I suppose the main thing is to have a good case insensitive compare such that PRUSSIAN and Prußian are considered equal.

Thanks.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unicode string comparision functions
  2015-11-12  4:06 Unicode string comparision functions Shark8
@ 2015-11-12  5:04 ` Jeffrey R. Carter
  2015-11-12 20:01   ` Shark8
  2015-11-12 19:46 ` Randy Brukardt
  2015-11-13  6:03 ` Vadim Godunko
  2 siblings, 1 reply; 11+ messages in thread
From: Jeffrey R. Carter @ 2015-11-12  5:04 UTC (permalink / raw)


On 11/11/2015 09:06 PM, Shark8 wrote:
> I thought I had come across a unicode Equals_Case_Insensitive (and less than)
> for unicode using Wide_Wide_Strings some time ago, but I cannot seem to find
> them again; am I misremembering, or were they in a really odd place?

Searching the ARM index

http://www.adaic.org/resources/add_content/standards/12rm/html/RM-0-4.html

for Case_Insensitive, I found Ada.Strings.Wide_Wide_Equal_Case_Insensitive in
ARM A.4.8 in a few seconds.

http://www.adaic.org/resources/add_content/standards/12rm/html/RM-A-4-8.html#I5962

-- 
Jeff Carter
"When Roman engineers built a bridge, they had to stand under it
while the first legion marched across. If programmers today
worked under similar ground rules, they might well find
themselves getting much more interested in Ada!"
Robert Dewar
62


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unicode string comparision functions
  2015-11-12  4:06 Unicode string comparision functions Shark8
  2015-11-12  5:04 ` Jeffrey R. Carter
@ 2015-11-12 19:46 ` Randy Brukardt
  2015-11-12 20:07   ` Shark8
  2015-11-13  6:03 ` Vadim Godunko
  2 siblings, 1 reply; 11+ messages in thread
From: Randy Brukardt @ 2015-11-12 19:46 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1409 bytes --]

"Shark8" <onewingedshark@gmail.com> wrote in message 
news:00aab01c-7d18-408a-9a4c-feb80ac9a1e1@googlegroups.com...
>I thought I had come across a unicode Equals_Case_Insensitive
>(and less than) for unicode using Wide_Wide_Strings some time
>ago, but I cannot seem to find them again; am I misremembering,
>or were they in a really odd place?

Not an odd place, but they have their own subclause (A.4.10).

>For this particular application I would rather use Wide_Wide_String than
> Wide_String so I wouldn't have to worry about invalid character 
> [sequences]
> for the non-ASCII characters. (And, while UTF-8 encoded strings have the
> nice property of being endian agnostic, they still have that property.) --  
> But I
> suppose the main thing is to have a good case insensitive compare such 
> that
> PRUSSIAN and Prußian are considered equal.

Sorry, the language-defined equality won't do that. It uses 
"locale-independent simple case folding", which means that strings of 
different lengths are always different. (That's the same case comparison 
that's used for Ada identifiers.)

The much more complex "locale-independent full case folding" is not provided 
by the language, we didn't want to inflict that level of pain on Ada 
implementers (especially as the need was unclear).

The AARM note A.4.10(3.a/3) gives a bit of background.

                                       Randy.



Thanks. 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unicode string comparision functions
  2015-11-12  5:04 ` Jeffrey R. Carter
@ 2015-11-12 20:01   ` Shark8
  2015-11-12 22:33     ` Jeffrey R. Carter
  0 siblings, 1 reply; 11+ messages in thread
From: Shark8 @ 2015-11-12 20:01 UTC (permalink / raw)


On Wednesday, November 11, 2015 at 10:04:23 PM UTC-7, Jeffrey R. Carter wrote:
> On 11/11/2015 09:06 PM, Shark8 wrote:
> > I thought I had come across a unicode Equals_Case_Insensitive (and less than)
> > for unicode using Wide_Wide_Strings some time ago, but I cannot seem to find
> > them again; am I misremembering, or were they in a really odd place?
> 
> Searching the ARM index
> 
> http://www.adaic.org/resources/add_content/standards/12rm/html/RM-0-4.html
> 
> for Case_Insensitive, I found Ada.Strings.Wide_Wide_Equal_Case_Insensitive in
> ARM A.4.8 in a few seconds.

...Wide_Wide_Equal_Case_Insensitive --- Testing it out, the compiler's giving me an error saying ""Ada.Strings.Wide_Wide_Equal_Case_Insensitive" is not a predefined library unit", which is good in that it saves me from feeling **REALLY** stupid.

> 
> http://www.adaic.org/resources/add_content/standards/12rm/html/RM-A-4-8.html#I5962

Thanks for the ref.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unicode string comparision functions
  2015-11-12 19:46 ` Randy Brukardt
@ 2015-11-12 20:07   ` Shark8
  2015-11-12 21:35     ` Randy Brukardt
  0 siblings, 1 reply; 11+ messages in thread
From: Shark8 @ 2015-11-12 20:07 UTC (permalink / raw)


On Thursday, November 12, 2015 at 12:46:22 PM UTC-7, Randy Brukardt wrote:
> "Shark8" wrote in message 
> 
> >I thought I had come across a unicode Equals_Case_Insensitive
> >(and less than) for unicode using Wide_Wide_Strings some time
> >ago, but I cannot seem to find them again; am I misremembering,
> >or were they in a really odd place?
> 
> Not an odd place, but they have their own subclause (A.4.10).

Thank you for the ref.

> 
> >For this particular application I would rather use Wide_Wide_String than
> > Wide_String so I wouldn't have to worry about invalid character 
> > [sequences]
> > for the non-ASCII characters. (And, while UTF-8 encoded strings have the
> > nice property of being endian agnostic, they still have that property.) --  
> > But I
> > suppose the main thing is to have a good case insensitive compare such 
> > that
> > PRUSSIAN and Prußian are considered equal.
> 
> Sorry, the language-defined equality won't do that. It uses 
> "locale-independent simple case folding", which means that strings of 
> different lengths are always different. (That's the same case comparison 
> that's used for Ada identifiers.)
> 
> The much more complex "locale-independent full case folding" is not provided 
> by the language, we didn't want to inflict that level of pain on Ada 
> implementers (especially as the need was unclear).

I can see why, and certainly don't begrudge that decision -- unicode is, IMO, a terrible 'solution' to the problem of multiple languages.

I thought I read something in the rationale that implied the full case folding was to be used, at least with respect identifiers in Ada's own source-code... and so mistakenly thought the Equal_Case_Insensitive would do so (after all, if the compiler itself requires that functionality there's little reason not to provide access to it).

> 
> The AARM note A.4.10(3.a/3) gives a bit of background.

I'll have to read that.

Thank you.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unicode string comparision functions
  2015-11-12 20:07   ` Shark8
@ 2015-11-12 21:35     ` Randy Brukardt
  0 siblings, 0 replies; 11+ messages in thread
From: Randy Brukardt @ 2015-11-12 21:35 UTC (permalink / raw)


"Shark8" <onewingedshark@gmail.com> wrote in message 
news:fdb68ece-f102-481c-af22-6999d29be7a1@googlegroups.com...
...
> I thought I read something in the rationale that implied the
> full case folding was to be used, at least with respect identifiers
> in Ada's own source-code... and so mistakenly thought the
> Equal_Case_Insensitive would do so (after all, if the compiler itself
> requires that functionality there's little reason not to provide access to 
> it).

That was once the case, until it was discovered that doing that was 
incompatible and inconsistent with Ada 95 code. In particular, full case 
folding could consider two identifiers the same that Ada 95 considered 
distinct (your example would be exactly one such case). And that could 
silently change the meaning of a program (a different object could be used 
if the "right" nested declarations were used). That was too nasty to 
contemplate, so we changed to simple case folding (at that point, no one had 
implemented the rules correctly, so we figured that we could just change 
them).

There's also a bizarre rule for enumeration types so that the 'Image values 
are distinct, else 'Value couldn't work as expected. I've never tried to 
work out an example (probably Turkish would show it with the dotless I).

                                       Randy.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unicode string comparision functions
  2015-11-12 20:01   ` Shark8
@ 2015-11-12 22:33     ` Jeffrey R. Carter
  2015-11-13  0:10       ` Randy Brukardt
  0 siblings, 1 reply; 11+ messages in thread
From: Jeffrey R. Carter @ 2015-11-12 22:33 UTC (permalink / raw)


On 11/12/2015 01:01 PM, Shark8 wrote:
> 
> ...Wide_Wide_Equal_Case_Insensitive --- Testing it out, the compiler's giving
> me an error saying ""Ada.Strings.Wide_Wide_Equal_Case_Insensitive" is not a
> predefined library unit", which is good in that it saves me from feeling
> **REALLY** stupid.

ARM A.4.8 clearly indicates that there should be such a library function,
similar to Ada.Strings.Equal_Case_Insensitive from A.4.10 for String.

They were added by Ada 12, so if you're using a compiler for an earlier version
they probably won't exist. If you are using an Ada-12 compiler, you should
probably complain.

-- 
Jeff Carter
"Sons of a silly person."
Monty Python & the Holy Grail
02


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unicode string comparision functions
  2015-11-12 22:33     ` Jeffrey R. Carter
@ 2015-11-13  0:10       ` Randy Brukardt
  2015-11-13  8:22         ` Simon Wright
  0 siblings, 1 reply; 11+ messages in thread
From: Randy Brukardt @ 2015-11-13  0:10 UTC (permalink / raw)


"Jeffrey R. Carter" <spam.jrcarter.not@spam.not.acm.org> wrote in message 
news:n233vr$1h5$2@dont-email.me...
> On 11/12/2015 01:01 PM, Shark8 wrote:
>>
>> ...Wide_Wide_Equal_Case_Insensitive --- Testing it out, the compiler's 
>> giving
>> me an error saying ""Ada.Strings.Wide_Wide_Equal_Case_Insensitive" is not 
>> a
>> predefined library unit", which is good in that it saves me from feeling
>> **REALLY** stupid.
>
> ARM A.4.8 clearly indicates that there should be such a library function,
> similar to Ada.Strings.Equal_Case_Insensitive from A.4.10 for String.
>
> They were added by Ada 12, so if you're using a compiler for an earlier 
> version
> they probably won't exist. If you are using an Ada-12 compiler, you should
> probably complain.

Right, assuming you spelled it right (with the Wide Wide madness, spelling 
it wrong isn't that uncommon!).

                                  Randy.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unicode string comparision functions
  2015-11-12  4:06 Unicode string comparision functions Shark8
  2015-11-12  5:04 ` Jeffrey R. Carter
  2015-11-12 19:46 ` Randy Brukardt
@ 2015-11-13  6:03 ` Vadim Godunko
  2015-11-13 17:43   ` Shark8
  2 siblings, 1 reply; 11+ messages in thread
From: Vadim Godunko @ 2015-11-13  6:03 UTC (permalink / raw)


On Thursday, November 12, 2015 at 7:06:25 AM UTC+3, Shark8 wrote:
> 
> But I suppose the main thing is to have a good case insensitive compare such that PRUSSIAN and Prußian are considered equal.
> 

You can try to use Matreshka's Universal_String and its implementation of locale depend collation.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unicode string comparision functions
  2015-11-13  0:10       ` Randy Brukardt
@ 2015-11-13  8:22         ` Simon Wright
  0 siblings, 0 replies; 11+ messages in thread
From: Simon Wright @ 2015-11-13  8:22 UTC (permalink / raw)


"Randy Brukardt" <randy@rrsoftware.com> writes:

> "Jeffrey R. Carter" <spam.jrcarter.not@spam.not.acm.org> wrote in message 
> news:n233vr$1h5$2@dont-email.me...
>> On 11/12/2015 01:01 PM, Shark8 wrote:
>>>
>>> ...Wide_Wide_Equal_Case_Insensitive --- Testing it out, the
>>> compiler's giving me an error saying
>>> ""Ada.Strings.Wide_Wide_Equal_Case_Insensitive" is not a predefined
>>> library unit", which is good in that it saves me from feeling
>>> **REALLY** stupid.
>>
>> ARM A.4.8 clearly indicates that there should be such a library
>> function, similar to Ada.Strings.Equal_Case_Insensitive from A.4.10
>> for String.
>>
>> They were added by Ada 12, so if you're using a compiler for an
>> earlier version they probably won't exist. If you are using an Ada-12
>> compiler, you should probably complain.
>
> Right, assuming you spelled it right (with the Wide Wide madness,
> spelling it wrong isn't that uncommon!).

GNAT GPL 2015 provides Equal_Case_Insensitive only in plain Strings,
Bounded, Fixed, Unbounded.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Unicode string comparision functions
  2015-11-13  6:03 ` Vadim Godunko
@ 2015-11-13 17:43   ` Shark8
  0 siblings, 0 replies; 11+ messages in thread
From: Shark8 @ 2015-11-13 17:43 UTC (permalink / raw)


On Thursday, November 12, 2015 at 11:03:59 PM UTC-7, Vadim Godunko wrote:
> 
> You can try to use Matreshka's Universal_String and its implementation of locale depend collation.

That's awesome, I'd have to check out the licensing though as the stuff I'm playing around with could end up in another project... and It'd be nice if, for that project, we could have complete discretion choosing licenses coupled with the fewest amount of dependencies possible.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-11-13 17:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-12  4:06 Unicode string comparision functions Shark8
2015-11-12  5:04 ` Jeffrey R. Carter
2015-11-12 20:01   ` Shark8
2015-11-12 22:33     ` Jeffrey R. Carter
2015-11-13  0:10       ` Randy Brukardt
2015-11-13  8:22         ` Simon Wright
2015-11-12 19:46 ` Randy Brukardt
2015-11-12 20:07   ` Shark8
2015-11-12 21:35     ` Randy Brukardt
2015-11-13  6:03 ` Vadim Godunko
2015-11-13 17:43   ` Shark8

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox