From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII X-Google-Thread: 103376,1086bab45b40d4b0 X-Google-Attributes: gid103376,public Path: controlnews3.google.com!news2.google.com!news.maxwell.syr.edu!newsfeed.icl.net!feed.news.tiscali.de!newsfeed01.sul.t-online.de!newsfeed00.sul.t-online.de!newsmm00.sul.t-online.de!t-online.de!news.t-online.com!not-for-mail From: Martin Krischik Newsgroups: comp.lang.ada Subject: Re: UTF-8 in strings - a bug? Date: Sat, 08 May 2004 08:38:23 +0200 Organization: AdaCL Message-ID: <1146278.PRGNMAO9kp@linux1.krischik.com> References: <200456-112553-85684@foorum.com> <2178612.8V5KANFFf5@linux1.krischik.com> Reply-To: krischik@users.sourceforge.net Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 8Bit X-Trace: news.t-online.com 1084030376 07 24930 HJz0Go8rQk13EqV 040508 15:32:56 X-Complaints-To: usenet-abuse@t-online.de X-ID: V9KWjwZOYe9p+ms9WYryuKEl54HWWmtWOROInsNU5JxbMXblfYxFYN User-Agent: KNode/0.7.7 Xref: controlnews3.google.com comp.lang.ada:387 Date: 2004-05-08T08:38:23+02:00 List-Id: Bj�rn Persson wrote: > Martin Krischik wrote: > >> XMLAda comes with a Unicode library which can do some transcoding. > > Well, I suppose the existence of that library is a good thing, but after > reading the introduction in unicode.ads I have to wonder whether it's > them or me who have misunderstood Unicode. It mentions "Utf32 Latin1" > and "Utf8 Latin2" strings. This looks really weird to me. You don't > encode Latin-1 in UTF-32 or Latin-2 in UTF-8. You encode Unicode in > UTF-8 or UTF-32, or you encode a subset of Unicode in Latin-1, or > another subset in Latin-2. Well, I have worked a bit more with that library and it seems that there are special versions of UTF-8 and that you can place some info block at the beginning at the UTF-8 String for fine tuning. UTF-16 and UTF-32 are variable length encodings as well. Just in case extrateritials finally drop in and we need 64 bit character sets. So the XMLAda seems more complete then the average Unicode implementation. With Regards Martin -- mailto://krischik@users.sourceforge.net http://www.ada.krischik.com