From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,bcb6f63419c2a56b X-Google-Attributes: gid103376,public Path: controlnews3.google.com!news2.google.com!news.maxwell.syr.edu!newsfeed.icl.net!newsfeed.fjserv.net!colt.net!peernews3.colt.net!newsfeed00.sul.t-online.de!newsmm00.sul.t-online.de!t-online.de!news.t-online.com!not-for-mail From: Martin Krischik Newsgroups: comp.lang.ada Subject: Re: Supporting full Unicode Date: Wed, 12 May 2004 12:43:22 +0200 Organization: AdaCL Message-ID: <1326387.jKy1seHaPs@linux1.krischik.com> References: <9j8oc.16324$V97.13312@newsread1.news.pas.earthlink.net> <2004512-94456-948110@foorum.com> Reply-To: krischik@users.sourceforge.net Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7Bit X-Trace: news.t-online.com 1084358911 05 20285 K-L2G3SvxIk2TR3 040512 10:48:31 X-Complaints-To: usenet-abuse@t-online.de X-ID: ZYVhHuZTYe+HrjT5nFVamjHD7uIyWt5NNRTWCRc-lp5F7REE2y4+Ea User-Agent: KNode/0.7.7 Xref: controlnews3.google.com comp.lang.ada:475 Date: 2004-05-12T12:43:22+02:00 List-Id: Marius wrote: >> But I would favour using UTF-8 as the internal encoding anyway. It is >> easy to define a UTF8_String type similar to the above. GtkAda has >> such a type, as GTK+ uses UTF-8 as both internal and external >> encoding. > Indeed UTF-8 seems to rule. Probably because there are more ready-to-use > low level tools for 8-bit characters. Actually the proper tools for > Unicode should be 24-bit based. An ugly fact about Unicode is that the > code space is 24-bit and the encodings are all but 24 (8, 16, 32). Not quite right. The current code space is 32 bit of which only 24 bits are used. That of corse means that in UTF-8 a max of 4 character are used. However, this may change when the extrateristials arrive ;-). Any program with only 24 bit will break then. Won't happen. Well up until recently only 16 bit where used and programmers freely mixed UTF-16 and UCS-16. But then the archaeologist came. Of corse currently we repeat that mistake: UTF-32 is variable length as well and should not be mixed with UCS-32. With regards Martin -- mailto://krischik@users.sourceforge.net http://www.ada.krischik.com