From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII
X-Google-Thread: 103376,1086bab45b40d4b0
X-Google-Attributes: gid103376,public
Path: 
 controlnews3.google.com!news2.google.com!news.maxwell.syr.edu!central.cox.net!east.cox.net!filt01.cox.net!peer01.cox.net!cox.net!atl-c02.usenetserver.com!news.usenetserver.com!border1.nntp.ash.giganews.com!nntp.giganews.com!local1.nntp.ash.giganews.com!nntp.comcast.com!news.comcast.com.POSTED!not-for-mail
NNTP-Posting-Date: Wed, 05 May 2004 18:31:15 -0500
Date: Wed, 05 May 2004 19:31:15 -0400
From: "Robert I. Eachus" <rieachus@comcast.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
 rv:1.4) Gecko/20030624 Netscape/7.1 (ax)
X-Accept-Language: en-us, en
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: Re: UTF-8 in strings - a bug?
References: <TEdmc.58085$mU6.237063@newsb.telia.net>
In-Reply-To: <TEdmc.58085$mU6.237063@newsb.telia.net>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Message-ID: <WJOdndbsxKPZ5ATdRVn-iQ@comcast.com>
NNTP-Posting-Host: 24.147.90.114
X-Trace: 
 sv3-wAKTzm6d+Jk944zT9/XcvTDejO3UrUdMCh+VHEY6bYEAuQqBJwrL98YnWQln0kuBcV0jkEtms7c4cjc!D0r9ToxoB7abxWCRDaItYIJBlA3UNm71pFFS8wBXoypf1JlDH/LEm6rICvo3Zw==
X-Complaints-To: abuse@comcast.net
X-DMCA-Complaints-To: dmca@comcast.net
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint
 properly
X-Postfilter: 1.1
Xref: controlnews3.google.com comp.lang.ada:298
Date: 2004-05-05T19:31:15-04:00
List-Id: <comp.lang.ada>

Bj�rn Persson wrote:

> The reference manual says:
> 
> 3.5.2(2): The predefined type Character is a character type whose values 
> correspond to the 256 code positions of Row 00 (also known as Latin-1) 
> of the ISO 10646 Basic Multilingual Plane (BMP).
> 
> 3.6.3(4): type String is array(Positive range <>) of Character;
> 
> It seems clear to me: Strings are Latin-1 (except for programs compiled 
> in nonstandard modes). But when I set my Fedora system to use UTF-8, the 
> strings I get from Ada.Command_Line.Argument contain UTF-8. This means 
> that some of the elements in the string aren't characters, only byte 
> values that are parts of multi-byte characters. And of course 'Length 
> returns the number of bytes, not the number of characters. This looks 
> like a violation of the standard. Should I consider this a bug in the 
> library? Or in the compiler (Gnat (GCC) 3.3.2 and 3.4.0)?

Hmmmm...  The technical answer is that GNAT is not validated on Fedora 
with UTF-8.  The practical answer is that with GNAT, you should compile 
using the UTF-8 non-standard mode, if you are using UTF-8.

But what if you want to validate on Fedora in UTF-8 mode?  Then you will 
have to modify the libraries to get this "right."


-- 

                                           Robert I. Eachus

"The terrorist enemy holds no territory, defends no population, is 
unconstrained by rules of warfare, and respects no law of morality. Such 
an enemy cannot be deterred, contained, appeased or negotiated with. It 
can only be destroyed--and that, ladies and gentlemen, is the business 
at hand."  -- Dick Cheney