comp.lang.ada
 help / color / mirror / Atom feed
From: Stoik <staszek.goldstein@gmail.com>
Subject: Re: strange behaviour of utf-8 files
Date: Sat, 16 Nov 2013 07:09:48 -0800 (PST)
Date: 2013-11-16T07:09:48-08:00	[thread overview]
Message-ID: <9d00683c-949c-4e88-a161-ebd78b350d39@googlegroups.com> (raw)
In-Reply-To: <1ghx537y5gbfq.17oazom68d4n6.dlg@40tude.net>

W dniu sobota, 16 listopada 2013 14:34:43 UTC+1 użytkownik Dmitry A. Kazakov napisał:
> On Sat, 16 Nov 2013 05:12:29 -0800 (PST), Stoik wrote:
> 
> 
> 
> > I am using gps 5.2.1 with utf-8 encoding in the editor. I tried to write a
> 
> > simple routine to strip the diacritical marks from Polish texts. When
> 
> > executing a test program, I got the "translation_error" message, and it
> 
> > turned out that the string consisting of Polish letters was treated as
> 
> > double the proper length. You can try for yourself: with
> 
> > s: string := "ó";
> 
> > we get s'length=2. Where is the hook? Is it a compiler error, gps error, or my own one?
> 
> 
> 
> Without source code it is impossible to say. But "ó" in UTF-8 is two
> 
> octets: 16#C3# 16#B3#. When packed into a string that must be 2 characters
> 
> long, considering octet=Character (which formally is not, but whatever).
> 
> 
> 
> P.S. I would not use Latin-1 or anything beyond 7-bit ASCII in the source
> 
> code in order to make it portable across different systems.
> 
> 
> 
> -- 
> 
> Regards,
> 
> Dmitry A. Kazakov
> 
> http://www.dmitry-kazakov.de

Thanks for the answer. Your advice is certainly sound, but not very satisfactory. The whole purpose of utf-8 is to make 
things portable across platforms. If the compiler cannot deal properly with the 
source code written in the utf-8 encoding, then the whole effort that went into
all the wide_ and wide_wide_ packages and the new packages that deal with various encodings is lost (all the Latin-x possibilities are useless anyway, at least on Windows platform). I am adjoining a trivial program which works differently according to the encoding (UTF-8 or ISO-8859-1) of the source code, printing 1 or 2 as the answer.

with ada.text_io; use ada.text_io;
procedure example is
   S : String := "ó";
begin
   Put_Line (S'Length'Img);
end;


  reply	other threads:[~2013-11-16 15:09 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-16 13:12 strange behaviour of utf-8 files Stoik
2013-11-16 13:34 ` Dmitry A. Kazakov
2013-11-16 15:09   ` Stoik [this message]
2013-11-16 15:55     ` Dmitry A. Kazakov
2013-11-17 13:32       ` Georg Bauhaus
2013-11-17 14:07         ` Dmitry A. Kazakov
2013-11-17 17:19           ` Dennis Lee Bieber
2013-11-17 18:07             ` Dmitry A. Kazakov
2013-11-17 19:05           ` Georg Bauhaus
2013-11-17 20:38             ` Dmitry A. Kazakov
2013-11-18  8:38               ` Georg Bauhaus
2013-11-18  9:01                 ` Dmitry A. Kazakov
2013-11-18 10:06                   ` Georg Bauhaus
2013-11-18  8:44               ` Georg Bauhaus
2013-11-18 10:24                 ` Dmitry A. Kazakov
2013-11-18 13:05                   ` G.B.
2013-11-18 15:25                     ` Dmitry A. Kazakov
2013-11-18 15:51                       ` G.B.
2013-11-18 17:34                         ` Dmitry A. Kazakov
2013-11-18  0:34           ` Stoik
2013-11-16 17:01     ` Georg Bauhaus
2013-11-17 10:38       ` Stoik
2013-11-16 15:12   ` Stoik
2013-11-16 15:57     ` Dmitry A. Kazakov
2013-11-17 11:12       ` Stoik
2013-11-22  1:03         ` Randy Brukardt
2013-11-22  3:02           ` Shark8
2013-11-22 11:54             ` Georg Bauhaus
2013-11-23  4:14             ` Randy Brukardt
2013-12-06  2:17               ` Georg Bauhaus
2013-11-16 20:06     ` Peter C. Chapin
2013-11-17 10:34       ` Stoik
2013-11-22  0:53       ` Randy Brukardt
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox