From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: a07f3367d7,802ccdc10f849020 X-Google-Attributes: gida07f3367d7,public,usenet X-Google-NewGroupId: yes X-Google-Language: ENGLISH,ASCII-7-bit X-Received: by 10.180.86.34 with SMTP id m2mr3035152wiz.5.1363060683805; Mon, 11 Mar 2013 20:58:03 -0700 (PDT) Path: g1ni60711wig.0!nntp.google.com!feeder1.cambriumusenet.nl!feed.tweaknews.nl!194.109.133.87.MISMATCH!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!xlned.com!feeder5.xlned.com!npeer.de.kpn-eurorings.net!npeer-ng0.de.kpn-eurorings.net!border2.nntp.ams2.giganews.com!border1.nntp.ams2.giganews.com!border3.nntp.ams.giganews.com!border1.nntp.ams.giganews.com!nntp.giganews.com!weretis.net!feeder1.news.weretis.net!usenet.pasdenom.info!aioe.org!.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: string and wide string usage Date: Thu, 7 Mar 2013 18:14:44 +0100 Organization: cbb software GmbH Message-ID: <1ax1cag856c6g.ditwwlkam2v1$.dlg@40tude.net> References: <5e5e7e80-7d69-47e1-9550-19e2e0a211a9@googlegroups.com> Reply-To: mailbox@dmitry-kazakov.de NNTP-Posting-Host: TNGw0NoNrWqwYmfxAaSXHQ.user.speranza.aioe.org Mime-Version: 1.0 X-Complaints-To: abuse@aioe.org User-Agent: 40tude_Dialog/2.0.15.1 X-Notice: Filtered by postfilter v. 0.8.2 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Date: 2013-03-07T18:14:44+01:00 List-Id: On Thu, 7 Mar 2013 06:20:05 -0800 (PST), ytomino wrote: > On Thursday, March 7, 2013 8:12:01 PM UTC+9, Ali Bendriss wrote: >> I've got some problem with some string in example: >> a base 64 encoded string >> V2luZG93c8KgNyBQcm9mZXNzaW9ubmVsIE4= >> wich decode to 'Windows\xa07 Professionnel N' in utf-8 >> every thing is working if I feed directly the database, but if want to >> apply Ada.Characters.Handling.To_Lower on the string before feeding the >> database postgres is not happy >> 'ERROR: invalid byte sequence for encoding "UTF8": 0xe2 0xa0 0x37' >> it's not really a big deal, but I would like to understand where the >> problem is. Do I have to use wide string ? > > Because functions in Ada.Characters.Handling take not UTF-8 but Latin-1. > You have to > 1. convert UTF-8 String to Wide_Wide_String, process UTF-32 and restore it to UTF-8. > (Ada.Characters.Conversion also take Latin-1. You have to use GNAT.Encode_String/Decode_String or Ada.Strings.UTF_Encoding for converting.) > 2. search a external library to process UTF-8 directly. Provided the base 64 encodes an UTF-8 string, which you wanted to convert to lower case UTF-8 string using the Unicode lower case mapping, then you can use function To_Lowercase (Value : String) return String; from http://www.dmitry-kazakov.de/ada/strings_edit.htm#7.6 -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de