From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,d7340a24f4e8fef1
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2004-02-15 14:31:08 PST
Path: 
 archiver1.google.com!news2.google.com!news.maxwell.syr.edu!wn14feed!worldnet.att.net!bgtnsc05-news.ops.worldnet.att.net.POSTED!not-for-mail
From: David Starner <dvdeug@email.ro>
Subject: Re: UTF-8 (was: AI-285 - Comment from Unicode list)
User-Agent: Pan/0.14.2 (This is not a psychotic episode. It's a cleansing
 moment of clarity. (Debian GNU/Linux))
Message-Id: <pan.2004.02.15.22.02.32.150318@email.ro>
Newsgroups: comp.lang.ada
References: <pan.2004.02.14.23.56.49.56153@email.ro>
 <VYydnQzyisFzGLLdRVn-hg@gbronline.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Date: Sun, 15 Feb 2004 22:31:08 GMT
NNTP-Posting-Host: 12.72.70.80
X-Complaints-To: abuse@worldnet.att.net
X-Trace: bgtnsc05-news.ops.worldnet.att.net 1076884268 12.72.70.80 (Sun,
 15 Feb 2004 22:31:08 GMT)
NNTP-Posting-Date: Sun, 15 Feb 2004 22:31:08 GMT
Organization: AT&T Worldnet
Xref: archiver1.google.com comp.lang.ada:5585
Date: 2004-02-15T22:31:08+00:00
List-Id: <comp.lang.ada>

On Sun, 15 Feb 2004 09:45:02 -0500, Wes Groleau wrote:
> I'd like to see a package (or built-in) to support UTF-8.
> But that's just me.  I do a little bit of Polish and Japanese
> and might do a little Burmese, so I need Unicode.  But since
> I'm mostly English and Spanish and French, if I used UTF-16
> my files would be 49.x% zero bytes.

But the internal character set has nothing to do with the external. We
could output UTF-8 and use UTF-16 or UTF-32 internally. In fact, if you
set the character set of the source code to UTF-8 with GNAT, it will input
and output UTF-8. (This is not a great design, IMO.)
 
> I have often been tempted to write such a package. Has it already been
> done?

http://sourceforge.net/projects/ngeadal/ will do it, among a few other
Unicode related things. I never really completed it, and it doesn't have
any sort of stream I/O (instead dumping files as a whole), but it should
work, and I'm willing to answer questions.

> I admit it--I don't even know what UCS-2 is.  :-)

Unicode is broken down into 17 planes, 4 of which are used in anyway. All
but one were empty until a couple years ago. UCS-2 is like UTF-16, but
doesn't support the surrogate code points needed to access planes besides
the first. That means that Gothic, Linear-A, Cuniform (in the future)
won't be supported; but it also means that the mathematical alphanumerics
and Cantonese won't be supported, as well as a lot of older literary
Chinese, Japanese, Korean and Vietnamese, and other minor Chinese
languages.