From: "Björn Persson" <spam-away@nowhere.nil>
Subject: UTF-8 in strings - a bug?
Date: Wed, 05 May 2004 22:12:03 GMT
Date: 2004-05-05T22:12:03+00:00 [thread overview]
Message-ID: <TEdmc.58085$mU6.237063@newsb.telia.net> (raw)
The reference manual says:
3.5.2(2): The predefined type Character is a character type whose values
correspond to the 256 code positions of Row 00 (also known as Latin-1)
of the ISO 10646 Basic Multilingual Plane (BMP).
3.6.3(4): type String is array(Positive range <>) of Character;
It seems clear to me: Strings are Latin-1 (except for programs compiled
in nonstandard modes). But when I set my Fedora system to use UTF-8, the
strings I get from Ada.Command_Line.Argument contain UTF-8. This means
that some of the elements in the string aren't characters, only byte
values that are parts of multi-byte characters. And of course 'Length
returns the number of bytes, not the number of characters. This looks
like a violation of the standard. Should I consider this a bug in the
library? Or in the compiler (Gnat (GCC) 3.3.2 and 3.4.0)?
--
Björn Persson
jor ers @sv ge.
b n_p son eri nu
next reply other threads:[~2004-05-05 22:12 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-05-05 22:12 Björn Persson [this message]
2004-05-05 23:31 ` UTF-8 in strings - a bug? Robert I. Eachus
2004-05-06 8:34 ` Björn Persson
2004-05-06 9:25 ` Ludovic Brenta
2004-05-06 17:13 ` Björn Persson
2004-05-06 18:24 ` Martin Krischik
2004-05-07 23:32 ` Björn Persson
2004-05-08 6:38 ` Martin Krischik
2004-05-08 7:44 ` Jacob Sparre Andersen
2004-05-08 11:06 ` Björn Persson
2004-05-08 16:25 ` Martin Krischik
2004-05-09 12:16 ` Georg Bauhaus
2004-05-10 6:29 ` Martin Krischik
2004-05-08 12:10 ` Georg Bauhaus
2004-05-06 9:06 ` David Starner
2004-05-06 17:36 ` Björn Persson
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox