comp.lang.ada
 help / color / mirror / Atom feed
From: Jacob Sparre Andersen <sparre@nbi.dk>
Subject: Re: Character Sets (plain text police report)
Date: Sun, 01 Dec 2002 12:28:14 +0100
Date: 2002-12-01T12:28:14+01:00	[thread overview]
Message-ID: <3DE9F24E.3010002@nbi.dk> (raw)
In-Reply-To: asaj6c$ts5$1@slb9.atl.mindspring.net

Marin David Condic wrote:
> It might make an easy extension to the Ada standard to include 32-bit
> Unicode. After all, its pretty much just a matter of taking existing
> packages and changing a few things so you could have Wide_Wide_Character.
> The question is, would it have sufficient utility to make it worth the
> effort? (Is there much use out there for 32-bit characters?)

Maybe not directly (except for in the far east), but there 
is a rather large and growing indirect need for full support 
for ISO-10646.

In Europe people are starting to switch from ISO-8859 
encodings to the UTF-8 encoding of ISO-10646.  This means 
that although people in practice seldom will use more than 
the 470-something European characters, they will start to 
expect to have access to use all of ISO-10646.

> Perhaps if some additional utility was piled on top of it so that reading a
> text file, Ada would automatically determine what it was looking at and give
> you back text in the proper size (create something like "Universal_String"
> and a whole bunch of utilities around it so it would hold 8, 16 or 32-bit
> characters depending on how it was loaded) - but I don't see how that could
> be done for all text files.

Agreed.  One needs some kind of information about which 
encoding is used - but that is already the case.  The best 
solution I can think of is to demand that the operating 
system keeps track of the file type (including encoding for 
text files).  The second best solution is (IMHO) to 
introduce a sensible common standard encoding.  I don't know 
if it should be UTF-8 or raw 32-bit ISO-10646.  And I can 
certainly not advice people to use the current procedure on 
Unix systems, where each user chooses his/her assumed 
encoding of text files.

> The concept is a little vague in my mind, but I could imagine how something
> like this might be a useful idea for a standard Ada library. It really
> doesn't require any fundamental changes to the language.

No.  But it would be nice, if one could demand that 
compilers can handle UTF-8 or raw 32-bit ISO-10646 encoded 
source files.

Greetings,

Jacob
-- 
"I don't want to gain immortality in my works.
  I want to gain it by not dying."




  reply	other threads:[~2002-12-01 11:28 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-11-28 17:53 Character Sets Robert C. Leif
2002-11-28 18:08 ` Character Sets (plain text police report) Warren W. Gay VE3WWG
2002-11-28 18:11   ` Warren W. Gay VE3WWG
2002-11-29 11:12     ` Lutz Donnerhacke
2002-11-29 14:58       ` Frank J. Lhota
2002-11-29 20:37   ` Robert C. Leif
2002-11-30 14:49     ` Marin David Condic
2002-12-01 11:28       ` Jacob Sparre Andersen [this message]
2002-12-01 14:38         ` Marin David Condic
2002-12-01 20:25           ` Jacob Sparre Andersen
2002-12-02  9:43             ` Preben Randhol
2002-12-02 13:26               ` Marin David Condic
2002-12-02  6:44           ` Robert C. Leif
2002-12-02  9:41           ` Preben Randhol
2002-12-02 16:58           ` Charles Lindsey
2002-12-02 19:29     ` A suggestion, completely unrelated to the original topic Wes Groleau
2002-12-02 23:21       ` David C. Hoos, Sr.
2002-11-29 12:28 ` Character Sets Georg Bauhaus
2002-12-02 18:28 ` Stephen Leake
2002-12-03  2:45   ` Robert C. Leif
2002-12-03 13:33     ` Robert A Duff
2002-12-03 15:32       ` Juanma Barranquero
2002-12-04  0:49       ` Robert C. Leif
2002-12-14  3:27         ` David Starner
2002-12-14 22:53           ` Vadim Godunko
2002-12-15  3:46             ` David Starner
2002-12-15 23:26             ` Robert C. Leif
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox