From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII
X-Google-Thread: 103376,1086bab45b40d4b0,start
X-Google-Attributes: gid103376,public
Path: 
 controlnews3.google.com!news2.google.com!news.maxwell.syr.edu!newsfeed.icl.net!newsfeed.arcor.de!news.tele.dk!news.tele.dk!small.news.tele.dk!newsfeed101.telia.com!nf02.dk.telia.net!news-stob.telia.net!telia.net!217.209.241.173.MISMATCH!masternews.telia.net.!newsb.telia.net.POSTED!not-for-mail
From: =?ISO-8859-1?Q?Bj=F6rn_Persson?= <spam-away@nowhere.nil>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031114
X-Accept-Language: sv, en-us
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: UTF-8 in strings - a bug?
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable
Message-ID: <TEdmc.58085$mU6.237063@newsb.telia.net>
Date: Wed, 05 May 2004 22:12:03 GMT
NNTP-Posting-Host: 217.209.116.179
X-Complaints-To: abuse@telia.com
X-Trace: newsb.telia.net 1083795123 217.209.116.179 (Thu,
 06 May 2004 00:12:03 CEST)
NNTP-Posting-Date: Thu, 06 May 2004 00:12:03 CEST
Organization: Telia Internet
Xref: controlnews3.google.com comp.lang.ada:293
Date: 2004-05-05T22:12:03+00:00
List-Id: <comp.lang.ada>

The reference manual says:

3.5.2(2): The predefined type Character is a character type whose values =

correspond to the 256 code positions of Row 00 (also known as Latin-1)=20
of the ISO 10646 Basic Multilingual Plane (BMP).

3.6.3(4): type String is array(Positive range <>) of Character;

It seems clear to me: Strings are Latin-1 (except for programs compiled=20
in nonstandard modes). But when I set my Fedora system to use UTF-8, the =

strings I get from Ada.Command_Line.Argument contain UTF-8. This means=20
that some of the elements in the string aren't characters, only byte=20
values that are parts of multi-byte characters. And of course 'Length=20
returns the number of bytes, not the number of characters. This looks=20
like a violation of the standard. Should I consider this a bug in the=20
library? Or in the compiler (Gnat (GCC) 3.3.2 and 3.4.0)?

--=20
Bj=F6rn Persson

jor ers @sv ge.
b n_p son eri nu