From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 Path: border1.nntp.ams.giganews.com!nntp.giganews.com!eternal-september.org!feeder.eternal-september.org!aioe.org!.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: strange behaviour of utf-8 files Date: Sat, 16 Nov 2013 14:34:43 +0100 Organization: cbb software GmbH Message-ID: <1ghx537y5gbfq.17oazom68d4n6.dlg@40tude.net> References: <73e0853b-454a-467f-9dc7-84ca5b9c29b2@googlegroups.com> Reply-To: mailbox@dmitry-kazakov.de NNTP-Posting-Host: Ws8cDh6KC0dYMbHlsA0RIw.user.speranza.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-Complaints-To: abuse@aioe.org User-Agent: 40tude_Dialog/2.0.15.1 X-Notice: Filtered by postfilter v. 0.8.2 Xref: number.nntp.dca.giganews.com comp.lang.ada:183896 Date: 2013-11-16T14:34:43+01:00 List-Id: On Sat, 16 Nov 2013 05:12:29 -0800 (PST), Stoik wrote: > I am using gps 5.2.1 with utf-8 encoding in the editor. I tried to write a > simple routine to strip the diacritical marks from Polish texts. When > executing a test program, I got the "translation_error" message, and it > turned out that the string consisting of Polish letters was treated as > double the proper length. You can try for yourself: with > s: string := "ó"; > we get s'length=2. Where is the hook? Is it a compiler error, gps error, or my own one? Without source code it is impossible to say. But "ó" in UTF-8 is two octets: 16#C3# 16#B3#. When packed into a string that must be 2 characters long, considering octet=Character (which formally is not, but whatever). P.S. I would not use Latin-1 or anything beyond 7-bit ASCII in the source code in order to make it portable across different systems. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de