comp.lang.ada
 help / color / mirror / Atom feed
From: "Robert I. Eachus" <rieachus@comcast.net>
Subject: Re: UTF-8 in strings - a bug?
Date: Wed, 05 May 2004 19:31:15 -0400
Date: 2004-05-05T19:31:15-04:00	[thread overview]
Message-ID: <WJOdndbsxKPZ5ATdRVn-iQ@comcast.com> (raw)
In-Reply-To: <TEdmc.58085$mU6.237063@newsb.telia.net>

Bj�rn Persson wrote:

> The reference manual says:
> 
> 3.5.2(2): The predefined type Character is a character type whose values 
> correspond to the 256 code positions of Row 00 (also known as Latin-1) 
> of the ISO 10646 Basic Multilingual Plane (BMP).
> 
> 3.6.3(4): type String is array(Positive range <>) of Character;
> 
> It seems clear to me: Strings are Latin-1 (except for programs compiled 
> in nonstandard modes). But when I set my Fedora system to use UTF-8, the 
> strings I get from Ada.Command_Line.Argument contain UTF-8. This means 
> that some of the elements in the string aren't characters, only byte 
> values that are parts of multi-byte characters. And of course 'Length 
> returns the number of bytes, not the number of characters. This looks 
> like a violation of the standard. Should I consider this a bug in the 
> library? Or in the compiler (Gnat (GCC) 3.3.2 and 3.4.0)?

Hmmmm...  The technical answer is that GNAT is not validated on Fedora 
with UTF-8.  The practical answer is that with GNAT, you should compile 
using the UTF-8 non-standard mode, if you are using UTF-8.

But what if you want to validate on Fedora in UTF-8 mode?  Then you will 
have to modify the libraries to get this "right."


-- 

                                           Robert I. Eachus

"The terrorist enemy holds no territory, defends no population, is 
unconstrained by rules of warfare, and respects no law of morality. Such 
an enemy cannot be deterred, contained, appeased or negotiated with. It 
can only be destroyed--and that, ladies and gentlemen, is the business 
at hand."  -- Dick Cheney




  reply	other threads:[~2004-05-05 23:31 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-05-05 22:12 UTF-8 in strings - a bug? Björn Persson
2004-05-05 23:31 ` Robert I. Eachus [this message]
2004-05-06  8:34   ` Björn Persson
2004-05-06  9:25     ` Ludovic Brenta
2004-05-06 17:13       ` Björn Persson
2004-05-06 18:24       ` Martin Krischik
2004-05-07 23:32         ` Björn Persson
2004-05-08  6:38           ` Martin Krischik
2004-05-08  7:44           ` Jacob Sparre Andersen
2004-05-08 11:06             ` Björn Persson
2004-05-08 16:25               ` Martin Krischik
2004-05-09 12:16                 ` Georg Bauhaus
2004-05-10  6:29                   ` Martin Krischik
2004-05-08 12:10           ` Georg Bauhaus
2004-05-06  9:06 ` David Starner
2004-05-06 17:36   ` Björn Persson
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox