From: "Randy Brukardt" <randy@rrsoftware.com>
Subject: Re: string and wide string usage
Date: Thu, 7 Mar 2013 17:53:25 -0600
Date: 2013-03-07T17:53:25-06:00 [thread overview]
Message-ID: <khb99p$98h$1@munin.nbi.dk> (raw)
In-Reply-To: 5e5e7e80-7d69-47e1-9550-19e2e0a211a9@googlegroups.com
"ytomino" <aghia05@gmail.com> wrote in message
news:5e5e7e80-7d69-47e1-9550-19e2e0a211a9@googlegroups.com...
> On Thursday, March 7, 2013 8:12:01 PM UTC+9, Ali Bendriss wrote:
>> I've got some problem with some string in example:
>> a base 64 encoded string
>> V2luZG93c8KgNyBQcm9mZXNzaW9ubmVsIE4=
>> wich decode to 'Windows\xa07 Professionnel N' in utf-8
>> every thing is working if I feed directly the database, but if want to
>> apply Ada.Characters.Handling.To_Lower on the string before feeding the
>> database postgres is not happy
>> 'ERROR: invalid byte sequence for encoding "UTF8": 0xe2 0xa0 0x37'
>> it's not really a big deal, but I would like to understand where the
>> problem is. Do I have to use wide string ?
>
> Because functions in Ada.Characters.Handling take not UTF-8 but Latin-1.
Right. The proper thing to do (for Ada 2012) is to use
Ada.Characters.Wide_Handling (or Wide_Wide_Handling) to do the case
conversion, after converting the UTF-8 into a Wide_String (or
Wide_Wide_String).
If you're trying to do this in an older version of Ada, you'll have to find
some library somewhere to do the job.
But I want to caution you that "converting to lower case" is not a great
idea if you plan to support arbitrary Unicode strings. Such conversions are
somewhat ambiguous, and tend to make strings appear similar that are
different (and sometimes the reverse happens as well). Usually, the best
plan is to store the strings unmodified and use Equal_Case_Insensitive to
compare them (this uses the most accurate comparison defined by Unicode, and
has the advantage of being guarenteed not to change in future character set
standards, which is NOT true of conversion to lower case).
There is a nice example of this problem in the next chapter of the Ada 2012
Rationale (although you'll have to wait untiil May to see it, unless you get
the Ada User Journal).
I realize you may have no choice given the design of your database might not
be in your control, and it might not matter if you don't plan to have Greek
and Turkish characters in your data (to mention two of the most common where
convert to lower case and Equal_Case_Insensitive give different answers for
Wide_Strings).
Randy.
next prev parent reply other threads:[~2013-03-07 23:53 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-07 11:12 string and wide string usage Ali Bendriss
2013-03-07 14:20 ` ytomino
2013-03-07 17:14 ` Dmitry A. Kazakov
2013-03-07 23:53 ` Randy Brukardt [this message]
2013-03-08 2:05 ` Yannick Duchêne (Hibou57)
2013-03-08 3:07 ` Randy Brukardt
2013-03-07 17:48 ` Vadim Godunko
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox