From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM, MAILING_LIST_MULTI autolearn=unavailable autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,bcb6f63419c2a56b X-Google-Attributes: gid103376,public Path: controlnews3.google.com!news2.google.com!news.maxwell.syr.edu!newsfeed.icl.net!newsfeed.fjserv.net!nnx.oleane.net!oleane!freenix!enst.fr!melchior!cuivre.fr.eu.org!melchior.frmug.org!not-for-mail From: "Marius Amado Alves" Newsgroups: comp.lang.ada Subject: Re: Supporting full Unicode Date: Wed, 12 May 2004 09:23:51 +0100 Organization: Cuivre, Argent, Or Message-ID: References: <9j8oc.16324$V97.13312@newsread1.news.pas.earthlink.net> <2004512-94456-948110@foorum.com> NNTP-Posting-Host: lovelace.ada-france.org Mime-Version: 1.0 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: 7bit X-Trace: melchior.cuivre.fr.eu.org 1084354452 82143 212.85.156.195 (12 May 2004 09:34:12 GMT) X-Complaints-To: usenet@melchior.cuivre.fr.eu.org NNTP-Posting-Date: Wed, 12 May 2004 09:34:12 +0000 (UTC) To: Return-Path: X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-OriginalArrivalTime: 12 May 2004 08:23:27.0829 (UTC) FILETIME=[66BDE850:01C437FA] X-Virus-Scanned: by amavisd-new-20030616-p7 (Debian) at ada-france.org X-BeenThere: comp.lang.ada@ada-france.org X-Mailman-Version: 2.1.4 Precedence: list List-Id: "Gateway to the comp.lang.ada Usenet newsgroup" List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Xref: controlnews3.google.com comp.lang.ada:468 Date: 2004-05-12T09:23:51+01:00 > But I would favour using UTF-8 as the internal encoding anyway. It is > easy to define a UTF8_String type similar to the above. GtkAda has > such a type, as GTK+ uses UTF-8 as both internal and external > encoding. Indeed UTF-8 seems to rule. Probably because there are more ready-to-use low level tools for 8-bit characters. Actually the proper tools for Unicode should be 24-bit based. An ugly fact about Unicode is that the code space is 24-bit and the encodings are all but 24 (8, 16, 32).