Re: string and wide string usage

comp.lang.ada
 help / color / mirror / Atom feed

From: "Randy Brukardt" <randy@rrsoftware.com>
Subject: Re: string and wide string usage
Date: Thu, 7 Mar 2013 17:53:25 -0600
Date: 2013-03-07T17:53:25-06:00	[thread overview]
Message-ID: <khb99p$98h$1@munin.nbi.dk> (raw)
In-Reply-To: 5e5e7e80-7d69-47e1-9550-19e2e0a211a9@googlegroups.com

"ytomino" <aghia05@gmail.com> wrote in message 
news:5e5e7e80-7d69-47e1-9550-19e2e0a211a9@googlegroups.com...
> On Thursday, March 7, 2013 8:12:01 PM UTC+9, Ali Bendriss wrote:
>> I've got some problem with some string in example:
>> a base 64 encoded string
>> V2luZG93c8KgNyBQcm9mZXNzaW9ubmVsIE4=
>> wich decode to 'Windows\xa07 Professionnel N' in utf-8
>> every thing is working if I feed directly the database, but if want to
>> apply Ada.Characters.Handling.To_Lower on the string before feeding the
>> database postgres is not happy
>> 'ERROR:  invalid byte sequence for encoding "UTF8": 0xe2 0xa0 0x37'
>> it's not really a big deal, but I would like to understand where the
>> problem is. Do I have to use wide string ?
>
> Because functions in Ada.Characters.Handling take not UTF-8 but Latin-1.

Right. The proper thing to do (for Ada 2012) is to use 
Ada.Characters.Wide_Handling (or Wide_Wide_Handling) to do the case 
conversion, after converting the UTF-8 into a Wide_String (or 
Wide_Wide_String).

If you're trying to do this in an older version of Ada, you'll have to find 
some library somewhere to do the job.

But I want to caution you that "converting to lower case" is not a great 
idea if you plan to support arbitrary Unicode strings. Such conversions are 
somewhat ambiguous, and tend to make strings appear similar that are 
different (and sometimes the reverse happens as well). Usually, the best 
plan is to store the strings unmodified and use Equal_Case_Insensitive to 
compare them (this uses the most accurate comparison defined by Unicode, and 
has the advantage of being guarenteed not to change in future character set 
standards, which is NOT true of conversion to lower case).

There is a nice example of this problem in the next chapter of the Ada 2012 
Rationale (although you'll have to wait untiil May to see it, unless you get 
the Ada User Journal).

I realize you may have no choice given the design of your database might not 
be in your control, and it might not matter if you don't plan to have Greek 
and Turkish characters in your data (to mention two of the most common where 
convert to lower case and Equal_Case_Insensitive give different answers for 
Wide_Strings).

                                     Randy.

next prev parent reply	other threads:[~2013-03-07 23:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-07 11:12 string and wide string usage Ali Bendriss
2013-03-07 14:20 ` ytomino
2013-03-07 17:14   ` Dmitry A. Kazakov
2013-03-07 23:53   ` Randy Brukardt [this message]
2013-03-08  2:05     ` Yannick Duchêne (Hibou57)
2013-03-08  3:07       ` Randy Brukardt
2013-03-07 17:48 ` Vadim Godunko

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox