From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: a07f3367d7,802ccdc10f849020
X-Google-Attributes: gida07f3367d7,public,usenet
X-Google-NewGroupId: yes
X-Google-Language: ENGLISH,ASCII-7-bit
X-Received: by 10.180.98.102 with SMTP id eh6mr3033663wib.7.1363061293648;
        Mon, 11 Mar 2013 21:08:13 -0700 (PDT)
MIME-Version: 1.0
Path: 
 g1ni60769wig.0!nntp.google.com!feeder1.cambriumusenet.nl!82.197.223.108.MISMATCH!feeder2.cambriumusenet.nl!feed.tweaknews.nl!216.40.29.245.MISMATCH!novia!border4.nntp.dca.giganews.com!border2.nntp.dca.giganews.com!backlog2.nntp.ams.giganews.com!border4.nntp.ams.giganews.com!border2.nntp.ams.giganews.com!nntp.giganews.com!weretis.net!feeder1.news.weretis.net!feeder4.news.weretis.net!nuzba.szn.dk!news.jacob-sparre.dk!munin.jacob-sparre.dk!pnx.dk!.POSTED!not-for-mail
From: "Randy Brukardt" <randy@rrsoftware.com>
Newsgroups: comp.lang.ada
Subject: Re: string and wide string usage
Date: Thu, 7 Mar 2013 17:53:25 -0600
Organization: Jacob Sparre Andersen Research & Innovation
Message-ID: <khb99p$98h$1@munin.nbi.dk>
References: <kh9sm1$tj4$1@dont-email.me>
 <5e5e7e80-7d69-47e1-9550-19e2e0a211a9@googlegroups.com>
NNTP-Posting-Host: static-69-95-181-76.mad.choiceone.net
X-Trace: munin.nbi.dk 1362700409 9489 69.95.181.76 (7 Mar 2013 23:53:29 GMT)
X-Complaints-To: news@jacob-sparre.dk
NNTP-Posting-Date: Thu, 7 Mar 2013 23:53:29 +0000 (UTC)
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.5931
X-RFC2646: Format=Flowed; Original
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
X-Original-Bytes: 3354
Date: 2013-03-07T17:53:25-06:00
List-Id: <comp.lang.ada>

"ytomino" <aghia05@gmail.com> wrote in message 
news:5e5e7e80-7d69-47e1-9550-19e2e0a211a9@googlegroups.com...
> On Thursday, March 7, 2013 8:12:01 PM UTC+9, Ali Bendriss wrote:
>> I've got some problem with some string in example:
>> a base 64 encoded string
>> V2luZG93c8KgNyBQcm9mZXNzaW9ubmVsIE4=
>> wich decode to 'Windows\xa07 Professionnel N' in utf-8
>> every thing is working if I feed directly the database, but if want to
>> apply Ada.Characters.Handling.To_Lower on the string before feeding the
>> database postgres is not happy
>> 'ERROR:  invalid byte sequence for encoding "UTF8": 0xe2 0xa0 0x37'
>> it's not really a big deal, but I would like to understand where the
>> problem is. Do I have to use wide string ?
>
> Because functions in Ada.Characters.Handling take not UTF-8 but Latin-1.

Right. The proper thing to do (for Ada 2012) is to use 
Ada.Characters.Wide_Handling (or Wide_Wide_Handling) to do the case 
conversion, after converting the UTF-8 into a Wide_String (or 
Wide_Wide_String).

If you're trying to do this in an older version of Ada, you'll have to find 
some library somewhere to do the job.

But I want to caution you that "converting to lower case" is not a great 
idea if you plan to support arbitrary Unicode strings. Such conversions are 
somewhat ambiguous, and tend to make strings appear similar that are 
different (and sometimes the reverse happens as well). Usually, the best 
plan is to store the strings unmodified and use Equal_Case_Insensitive to 
compare them (this uses the most accurate comparison defined by Unicode, and 
has the advantage of being guarenteed not to change in future character set 
standards, which is NOT true of conversion to lower case).

There is a nice example of this problem in the next chapter of the Ada 2012 
Rationale (although you'll have to wait untiil May to see it, unless you get 
the Ada User Journal).

I realize you may have no choice given the design of your database might not 
be in your control, and it might not matter if you don't plan to have Greek 
and Turkish characters in your data (to mention two of the most common where 
convert to lower case and Equal_Case_Insensitive give different answers for 
Wide_Strings).

                                     Randy.