From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII
X-Google-Thread: 103376,1086bab45b40d4b0
X-Google-Attributes: gid103376,public
Path: 
 controlnews3.google.com!news1.google.com!news.glorb.com!news-stoc.telia.net!news-stoa.telia.net!telia.net!masternews.telia.net.!newsb.telia.net.POSTED!not-for-mail
From: =?ISO-8859-1?Q?Bj=F6rn_Persson?= <spam-away@nowhere.nil>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1) Gecko/20031114
X-Accept-Language: sv, en-us
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: Re: UTF-8 in strings - a bug?
References: <TEdmc.58085$mU6.237063@newsb.telia.net>
 <WJOdndbsxKPZ5ATdRVn-iQ@comcast.com> <lMmmc.58280$mU6.237078@newsb.telia.net>
 <200456-112553-85684@foorum.com> <2178612.8V5KANFFf5@linux1.krischik.com>
 <q0Vmc.58459$mU6.237464@newsb.telia.net> <pld65f8isk.fsf@sparre.crs4.it>
In-Reply-To: <pld65f8isk.fsf@sparre.crs4.it>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable
Message-ID: <4b3nc.58514$mU6.237399@newsb.telia.net>
Date: Sat, 08 May 2004 11:06:40 GMT
NNTP-Posting-Host: 217.209.116.179
X-Complaints-To: abuse@telia.com
X-Trace: newsb.telia.net 1084014400 217.209.116.179 (Sat,
 08 May 2004 13:06:40 CEST)
NNTP-Posting-Date: Sat, 08 May 2004 13:06:40 CEST
Organization: Telia Internet
Xref: controlnews3.google.com comp.lang.ada:384
Date: 2004-05-08T11:06:40+00:00
List-Id: <comp.lang.ada>

Jacob Sparre Andersen wrote:

> Your quotes (which may be unfair :-)

Sorry, I should have provided more context. Here's the relevant part of=20
unicode/unicode.ads in XML/Ada version 1.0 from ACT-Europe, so you don't =

have to download the library just to see what I'm talking about:


--  Coded character sets  (packages Unicode.CCS.*)
--  =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--  Mapping from a set of abstract characters to the set of non-negative
--  integers
--  The integer associated with a character is called "code point", and t=
he
--  character is called "encoded character"
--  Examples of these are:  ISO/8859-1, JIS X 0208, ...
--
--  Character naming (packages Unicode.Names.*)
--  =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--  A unique name is assigned to each abstract character, so that it is
--  possible to get the same character no matter what repertoire is used.=

--
--  Character Encoding Forms
--  =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
--  Mapping from the set of integers used in a Coded Character Set to=20
the set
--  of sequences of code units.
--  A "code unit" is integer occupying a specified binary width in a=20
computer
--  architecture
--  Examples of fixed-width encoding forms:  7-bit, 8-bit, EBCDIC
--  Examples of variable-width encoding forms:  Utf-8, Utf-16,...
--
--  Character Encoding Scheme (packages Unicode.CES.*)
--  =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
--  Mapping of code units into serialized byte sequences. It also takes i=
nto
--  account the byte-order serialization.

--  As a summary, converting a file containing latin-1 characters coded o=
n
--  8 bits to a Utf8 latin2 file, the following steps are involved:
--
--     Latin1 string  (contains bytes associated with code points in Lati=
n1)
--       |    "use Unicode.CES.Basic_8bit.To_Utf32"
--       v
--     Utf32 latin1 string (contains code points in Latin1)
--       |    "Convert argument to To_Utf32 should be
--       v         Unicode.CCS.Iso_8859_1.Convert"
--     Utf32 Unicode string (contains code points in Unicode)
--       |    "use Unicode.CES.Utf8.From_Utf32"
--       v
--     Utf8 Unicode string (contains code points in Unicode)
--       |    "Convert argument to From_Utf32 should be
--       v         Unicode.CCS.Iso_8859_2.Convert"
--     Utf8 Latin2 string (contains code points in Latin2)


Investigating furter, I see that docs/xml_2.html shows the exact same=20
example of converting Latin-1 to "Utf8 Latin2".

--=20
Bj=F6rn Persson

jor ers @sv ge.
b n_p son eri nu