From: Stoik <staszek.goldstein@gmail.com>
Subject: Re: strange behaviour of utf-8 files
Date: Sat, 16 Nov 2013 07:09:48 -0800 (PST)
Date: 2013-11-16T07:09:48-08:00 [thread overview]
Message-ID: <9d00683c-949c-4e88-a161-ebd78b350d39@googlegroups.com> (raw)
In-Reply-To: <1ghx537y5gbfq.17oazom68d4n6.dlg@40tude.net>
W dniu sobota, 16 listopada 2013 14:34:43 UTC+1 użytkownik Dmitry A. Kazakov napisał:
> On Sat, 16 Nov 2013 05:12:29 -0800 (PST), Stoik wrote:
>
>
>
> > I am using gps 5.2.1 with utf-8 encoding in the editor. I tried to write a
>
> > simple routine to strip the diacritical marks from Polish texts. When
>
> > executing a test program, I got the "translation_error" message, and it
>
> > turned out that the string consisting of Polish letters was treated as
>
> > double the proper length. You can try for yourself: with
>
> > s: string := "ó";
>
> > we get s'length=2. Where is the hook? Is it a compiler error, gps error, or my own one?
>
>
>
> Without source code it is impossible to say. But "ó" in UTF-8 is two
>
> octets: 16#C3# 16#B3#. When packed into a string that must be 2 characters
>
> long, considering octet=Character (which formally is not, but whatever).
>
>
>
> P.S. I would not use Latin-1 or anything beyond 7-bit ASCII in the source
>
> code in order to make it portable across different systems.
>
>
>
> --
>
> Regards,
>
> Dmitry A. Kazakov
>
> http://www.dmitry-kazakov.de
Thanks for the answer. Your advice is certainly sound, but not very satisfactory. The whole purpose of utf-8 is to make
things portable across platforms. If the compiler cannot deal properly with the
source code written in the utf-8 encoding, then the whole effort that went into
all the wide_ and wide_wide_ packages and the new packages that deal with various encodings is lost (all the Latin-x possibilities are useless anyway, at least on Windows platform). I am adjoining a trivial program which works differently according to the encoding (UTF-8 or ISO-8859-1) of the source code, printing 1 or 2 as the answer.
with ada.text_io; use ada.text_io;
procedure example is
S : String := "ó";
begin
Put_Line (S'Length'Img);
end;
next prev parent reply other threads:[~2013-11-16 15:09 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-16 13:12 strange behaviour of utf-8 files Stoik
2013-11-16 13:34 ` Dmitry A. Kazakov
2013-11-16 15:09 ` Stoik [this message]
2013-11-16 15:55 ` Dmitry A. Kazakov
2013-11-17 13:32 ` Georg Bauhaus
2013-11-17 14:07 ` Dmitry A. Kazakov
2013-11-17 17:19 ` Dennis Lee Bieber
2013-11-17 18:07 ` Dmitry A. Kazakov
2013-11-17 19:05 ` Georg Bauhaus
2013-11-17 20:38 ` Dmitry A. Kazakov
2013-11-18 8:38 ` Georg Bauhaus
2013-11-18 9:01 ` Dmitry A. Kazakov
2013-11-18 10:06 ` Georg Bauhaus
2013-11-18 8:44 ` Georg Bauhaus
2013-11-18 10:24 ` Dmitry A. Kazakov
2013-11-18 13:05 ` G.B.
2013-11-18 15:25 ` Dmitry A. Kazakov
2013-11-18 15:51 ` G.B.
2013-11-18 17:34 ` Dmitry A. Kazakov
2013-11-18 0:34 ` Stoik
2013-11-16 17:01 ` Georg Bauhaus
2013-11-17 10:38 ` Stoik
2013-11-16 15:12 ` Stoik
2013-11-16 15:57 ` Dmitry A. Kazakov
2013-11-17 11:12 ` Stoik
2013-11-22 1:03 ` Randy Brukardt
2013-11-22 3:02 ` Shark8
2013-11-22 11:54 ` Georg Bauhaus
2013-11-23 4:14 ` Randy Brukardt
2013-12-06 2:17 ` Georg Bauhaus
2013-11-16 20:06 ` Peter C. Chapin
2013-11-17 10:34 ` Stoik
2013-11-22 0:53 ` Randy Brukardt
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox