From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII X-Google-Thread: 103376,1086bab45b40d4b0 X-Google-Attributes: gid103376,public Path: controlnews3.google.com!news1.google.com!news.glorb.com!newsfeed00.sul.t-online.de!t-online.de!news-lei1.dfn.de!news-ham1.dfn.de!news.uni-hamburg.de!cs.tu-berlin.de!uni-duisburg.de!not-for-mail From: Georg Bauhaus Newsgroups: comp.lang.ada Subject: Re: UTF-8 in strings - a bug? Date: Sat, 8 May 2004 12:10:37 +0000 (UTC) Organization: GMUGHDU Message-ID: References: <200456-112553-85684@foorum.com> <2178612.8V5KANFFf5@linux1.krischik.com> NNTP-Posting-Host: l1-hrz.uni-duisburg.de Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Trace: a1-hrz.uni-duisburg.de 1084018237 18013 134.91.1.34 (8 May 2004 12:10:37 GMT) X-Complaints-To: usenet@news.uni-duisburg.de NNTP-Posting-Date: Sat, 8 May 2004 12:10:37 +0000 (UTC) User-Agent: tin/1.5.8-20010221 ("Blue Water") (UNIX) (HP-UX/B.11.00 (9000/800)) Xref: controlnews3.google.com comp.lang.ada:386 Date: 2004-05-08T12:10:37+00:00 List-Id: Bj�rn Persson wrote: : Well, I suppose the existence of that library is a good thing, but after : reading the introduction in unicode.ads I have to wonder whether it's : them or me who have misunderstood Unicode. It mentions "Utf32 Latin1" : and "Utf8 Latin2" strings. This looks really weird to me. You don't : encode Latin-1 in UTF-32 or Latin-2 in UTF-8. You encode Unicode in : UTF-8 or UTF-32, or you encode a subset of Unicode in Latin-1, or : another subset in Latin-2. what is meant I think, is that there are Latin-1 characters that as abstract characters have a code point in Unicode that corresponds to some UTF32 encoded character. They could as well be encoded using UTF8 or UTF16. Latin_Capital_Letter_E_With_Acute is present in ISO 8859-1 as well as in Unicode, and in Unicode, various bit combinations may be used to encode it for a computer. Unicode does have various Latin blocks, but I'm not sure about the Latin-2 line either. -- Georg