comp.lang.ada
 help / color / mirror / Atom feed
From: David Starner <dvdeug@email.ro>
Subject: Re: UTF-8 (was: AI-285 - Comment from Unicode list)
Date: Sun, 15 Feb 2004 22:31:08 GMT
Date: 2004-02-15T22:31:08+00:00	[thread overview]
Message-ID: <pan.2004.02.15.22.02.32.150318@email.ro> (raw)
In-Reply-To: VYydnQzyisFzGLLdRVn-hg@gbronline.com

On Sun, 15 Feb 2004 09:45:02 -0500, Wes Groleau wrote:
> I'd like to see a package (or built-in) to support UTF-8.
> But that's just me.  I do a little bit of Polish and Japanese
> and might do a little Burmese, so I need Unicode.  But since
> I'm mostly English and Spanish and French, if I used UTF-16
> my files would be 49.x% zero bytes.

But the internal character set has nothing to do with the external. We
could output UTF-8 and use UTF-16 or UTF-32 internally. In fact, if you
set the character set of the source code to UTF-8 with GNAT, it will input
and output UTF-8. (This is not a great design, IMO.)
 
> I have often been tempted to write such a package. Has it already been
> done?

http://sourceforge.net/projects/ngeadal/ will do it, among a few other
Unicode related things. I never really completed it, and it doesn't have
any sort of stream I/O (instead dumping files as a whole), but it should
work, and I'm willing to answer questions.

> I admit it--I don't even know what UCS-2 is.  :-)

Unicode is broken down into 17 planes, 4 of which are used in anyway. All
but one were empty until a couple years ago. UCS-2 is like UTF-16, but
doesn't support the surrogate code points needed to access planes besides
the first. That means that Gothic, Linear-A, Cuniform (in the future)
won't be supported; but it also means that the mathematical alphanumerics
and Cantonese won't be supported, as well as a lot of older literary
Chinese, Japanese, Korean and Vietnamese, and other minor Chinese
languages.



  reply	other threads:[~2004-02-15 22:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-02-15  0:25 AI-285 - Comment from Unicode list David Starner
2004-02-15 14:45 ` UTF-8 (was: AI-285 - Comment from Unicode list) Wes Groleau
2004-02-15 22:31   ` David Starner [this message]
2004-02-16 22:18     ` UTF-8 Wes Groleau
2004-02-17  2:05       ` UTF-8 David Starner
2004-02-17 13:39   ` UTF-8 Georg Bauhaus
2004-02-18  2:39     ` UTF-8 Wes Groleau
2004-02-18  2:40     ` UTF-8 Wes Groleau
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox