From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 10.236.136.199 with SMTP id w47mr213695yhi.56.1384614588757; Sat, 16 Nov 2013 07:09:48 -0800 (PST) X-Received: by 10.49.108.232 with SMTP id hn8mr47731qeb.14.1384614588740; Sat, 16 Nov 2013 07:09:48 -0800 (PST) Path: border1.nntp.dca3.giganews.com!backlog3.nntp.dca3.giganews.com!border3.nntp.dca.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!usenet.blueworldhosting.com!feeder01.blueworldhosting.com!peer03.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!nx01.iad01.newshosting.com!newshosting.com!news-out.readnews.com!news-xxxfer.readnews.com!209.85.216.87.MISMATCH!n1no530150qai.0!news-out.google.com!9ni31757qaf.0!nntp.google.com!i2no3145616qav.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Sat, 16 Nov 2013 07:09:48 -0800 (PST) In-Reply-To: <1ghx537y5gbfq.17oazom68d4n6.dlg@40tude.net> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=31.183.18.217; posting-account=fc1UmgoAAADREbhuD8e4smj7nsEdRFz9 NNTP-Posting-Host: 31.183.18.217 References: <73e0853b-454a-467f-9dc7-84ca5b9c29b2@googlegroups.com> <1ghx537y5gbfq.17oazom68d4n6.dlg@40tude.net> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <9d00683c-949c-4e88-a161-ebd78b350d39@googlegroups.com> Subject: Re: strange behaviour of utf-8 files From: Stoik Injection-Date: Sat, 16 Nov 2013 15:09:48 +0000 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Received-Bytes: 3339 X-Received-Body-CRC: 1754379613 X-Original-Bytes: 3581 Xref: number.nntp.dca.giganews.com comp.lang.ada:183897 Date: 2013-11-16T07:09:48-08:00 List-Id: W dniu sobota, 16 listopada 2013 14:34:43 UTC+1 u=C5=BCytkownik Dmitry A. K= azakov napisa=C5=82: > On Sat, 16 Nov 2013 05:12:29 -0800 (PST), Stoik wrote: >=20 >=20 >=20 > > I am using gps 5.2.1 with utf-8 encoding in the editor. I tried to writ= e a >=20 > > simple routine to strip the diacritical marks from Polish texts. When >=20 > > executing a test program, I got the "translation_error" message, and it >=20 > > turned out that the string consisting of Polish letters was treated as >=20 > > double the proper length. You can try for yourself: with >=20 > > s: string :=3D "=C3=B3"; >=20 > > we get s'length=3D2. Where is the hook? Is it a compiler error, gps err= or, or my own one? >=20 >=20 >=20 > Without source code it is impossible to say. But "=C3=B3" in UTF-8 is two >=20 > octets: 16#C3# 16#B3#. When packed into a string that must be 2 character= s >=20 > long, considering octet=3DCharacter (which formally is not, but whatever)= . >=20 >=20 >=20 > P.S. I would not use Latin-1 or anything beyond 7-bit ASCII in the source >=20 > code in order to make it portable across different systems. >=20 >=20 >=20 > --=20 >=20 > Regards, >=20 > Dmitry A. Kazakov >=20 > http://www.dmitry-kazakov.de Thanks for the answer. Your advice is certainly sound, but not very satisfa= ctory. The whole purpose of utf-8 is to make=20 things portable across platforms. If the compiler cannot deal properly with= the=20 source code written in the utf-8 encoding, then the whole effort that went = into all the wide_ and wide_wide_ packages and the new packages that deal with v= arious encodings is lost (all the Latin-x possibilities are useless anyway,= at least on Windows platform). I am adjoining a trivial program which work= s differently according to the encoding (UTF-8 or ISO-8859-1) of the source= code, printing 1 or 2 as the answer. with ada.text_io; use ada.text_io; procedure example is S : String :=3D "=C3=B3"; begin Put_Line (S'Length'Img); end;