comp.lang.ada
 help / color / mirror / Atom feed
From: "Björn Persson" <spam-away@nowhere.nil>
Subject: UTF-8 in strings - a bug?
Date: Wed, 05 May 2004 22:12:03 GMT
Date: 2004-05-05T22:12:03+00:00	[thread overview]
Message-ID: <TEdmc.58085$mU6.237063@newsb.telia.net> (raw)

The reference manual says:

3.5.2(2): The predefined type Character is a character type whose values 
correspond to the 256 code positions of Row 00 (also known as Latin-1) 
of the ISO 10646 Basic Multilingual Plane (BMP).

3.6.3(4): type String is array(Positive range <>) of Character;

It seems clear to me: Strings are Latin-1 (except for programs compiled 
in nonstandard modes). But when I set my Fedora system to use UTF-8, the 
strings I get from Ada.Command_Line.Argument contain UTF-8. This means 
that some of the elements in the string aren't characters, only byte 
values that are parts of multi-byte characters. And of course 'Length 
returns the number of bytes, not the number of characters. This looks 
like a violation of the standard. Should I consider this a bug in the 
library? Or in the compiler (Gnat (GCC) 3.3.2 and 3.4.0)?

-- 
Björn Persson

jor ers @sv ge.
b n_p son eri nu




             reply	other threads:[~2004-05-05 22:12 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-05-05 22:12 Björn Persson [this message]
2004-05-05 23:31 ` UTF-8 in strings - a bug? Robert I. Eachus
2004-05-06  8:34   ` Björn Persson
2004-05-06  9:25     ` Ludovic Brenta
2004-05-06 17:13       ` Björn Persson
2004-05-06 18:24       ` Martin Krischik
2004-05-07 23:32         ` Björn Persson
2004-05-08  6:38           ` Martin Krischik
2004-05-08  7:44           ` Jacob Sparre Andersen
2004-05-08 11:06             ` Björn Persson
2004-05-08 16:25               ` Martin Krischik
2004-05-09 12:16                 ` Georg Bauhaus
2004-05-10  6:29                   ` Martin Krischik
2004-05-08 12:10           ` Georg Bauhaus
2004-05-06  9:06 ` David Starner
2004-05-06 17:36   ` Björn Persson
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox