From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: a07f3367d7,802ccdc10f849020 X-Google-Attributes: gida07f3367d7,public,usenet X-Google-NewGroupId: yes X-Google-Language: ENGLISH,ASCII-7-bit X-Received: by 10.180.98.102 with SMTP id eh6mr3033663wib.7.1363061293648; Mon, 11 Mar 2013 21:08:13 -0700 (PDT) MIME-Version: 1.0 Path: g1ni60769wig.0!nntp.google.com!feeder1.cambriumusenet.nl!82.197.223.108.MISMATCH!feeder2.cambriumusenet.nl!feed.tweaknews.nl!216.40.29.245.MISMATCH!novia!border4.nntp.dca.giganews.com!border2.nntp.dca.giganews.com!backlog2.nntp.ams.giganews.com!border4.nntp.ams.giganews.com!border2.nntp.ams.giganews.com!nntp.giganews.com!weretis.net!feeder1.news.weretis.net!feeder4.news.weretis.net!nuzba.szn.dk!news.jacob-sparre.dk!munin.jacob-sparre.dk!pnx.dk!.POSTED!not-for-mail From: "Randy Brukardt" Newsgroups: comp.lang.ada Subject: Re: string and wide string usage Date: Thu, 7 Mar 2013 17:53:25 -0600 Organization: Jacob Sparre Andersen Research & Innovation Message-ID: References: <5e5e7e80-7d69-47e1-9550-19e2e0a211a9@googlegroups.com> NNTP-Posting-Host: static-69-95-181-76.mad.choiceone.net X-Trace: munin.nbi.dk 1362700409 9489 69.95.181.76 (7 Mar 2013 23:53:29 GMT) X-Complaints-To: news@jacob-sparre.dk NNTP-Posting-Date: Thu, 7 Mar 2013 23:53:29 +0000 (UTC) X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2900.5931 X-RFC2646: Format=Flowed; Original X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-Original-Bytes: 3354 Date: 2013-03-07T17:53:25-06:00 List-Id: "ytomino" wrote in message news:5e5e7e80-7d69-47e1-9550-19e2e0a211a9@googlegroups.com... > On Thursday, March 7, 2013 8:12:01 PM UTC+9, Ali Bendriss wrote: >> I've got some problem with some string in example: >> a base 64 encoded string >> V2luZG93c8KgNyBQcm9mZXNzaW9ubmVsIE4= >> wich decode to 'Windows\xa07 Professionnel N' in utf-8 >> every thing is working if I feed directly the database, but if want to >> apply Ada.Characters.Handling.To_Lower on the string before feeding the >> database postgres is not happy >> 'ERROR: invalid byte sequence for encoding "UTF8": 0xe2 0xa0 0x37' >> it's not really a big deal, but I would like to understand where the >> problem is. Do I have to use wide string ? > > Because functions in Ada.Characters.Handling take not UTF-8 but Latin-1. Right. The proper thing to do (for Ada 2012) is to use Ada.Characters.Wide_Handling (or Wide_Wide_Handling) to do the case conversion, after converting the UTF-8 into a Wide_String (or Wide_Wide_String). If you're trying to do this in an older version of Ada, you'll have to find some library somewhere to do the job. But I want to caution you that "converting to lower case" is not a great idea if you plan to support arbitrary Unicode strings. Such conversions are somewhat ambiguous, and tend to make strings appear similar that are different (and sometimes the reverse happens as well). Usually, the best plan is to store the strings unmodified and use Equal_Case_Insensitive to compare them (this uses the most accurate comparison defined by Unicode, and has the advantage of being guarenteed not to change in future character set standards, which is NOT true of conversion to lower case). There is a nice example of this problem in the next chapter of the Ada 2012 Rationale (although you'll have to wait untiil May to see it, unless you get the Ada User Journal). I realize you may have no choice given the design of your database might not be in your control, and it might not matter if you don't plan to have Greek and Turkish characters in your data (to mention two of the most common where convert to lower case and Equal_Case_Insensitive give different answers for Wide_Strings). Randy.