comp.lang.ada
 help / color / mirror / Atom feed
From: "Marin David Condic" <mcondic.auntie.spam@acm.org>
Subject: Re: Character Sets (plain text police report)
Date: Sun, 1 Dec 2002 09:38:18 -0500
Date: 2002-12-01T14:38:43+00:00	[thread overview]
Message-ID: <asd6tj$isb$1@slb2.atl.mindspring.net> (raw)
In-Reply-To: 3DE9F24E.3010002@nbi.dk

Jacob Sparre Andersen <sparre@nbi.dk> wrote in message
news:3DE9F24E.3010002@nbi.dk...
> > effort? (Is there much use out there for 32-bit characters?)
>
> Maybe not directly (except for in the far east), but there
> is a rather large and growing indirect need for full support
> for ISO-10646.
>
My understanding was that the 16 bit characters covered most of the
practical uses one would find in modern languages. The reason for the 32 bit
characters was to provide for things that might be truly obscure (Egyptian
heiroglyphics and such) or other special character sets that may not be that
big a deal if Ada didn't support it.



> In Europe people are starting to switch from ISO-8859
> encodings to the UTF-8 encoding of ISO-10646.  This means
> that although people in practice seldom will use more than
> the 470-something European characters, they will start to
> expect to have access to use all of ISO-10646.
>
So possibly if there was some kind of variant of Text_IO that dealt with
UTF-8 files, it might be useful. You'd need special data types and
operations, but that wouldn't be insurmountable. Some set of packages that
would be wrapped around UTF-8 as an extension to Ada or part of a standard
Ada library might make sense.



>
> Agreed.  One needs some kind of information about which
> encoding is used - but that is already the case.  The best
> solution I can think of is to demand that the operating
> system keeps track of the file type (including encoding for
> text files).  The second best solution is (IMHO) to
> introduce a sensible common standard encoding.  I don't know
> if it should be UTF-8 or raw 32-bit ISO-10646.  And I can
> certainly not advice people to use the current procedure on
> Unix systems, where each user chooses his/her assumed
> encoding of text files.
>
You'd almost certainly want some indication from the OS that a file was a
UTF-8 file. The "Form" parameter in the Text_IO.Open procedure would be the
natural place to be specifying it, I'd think. Or if it was a set of new
packages, the underlying implementation would want a means of checking that
the file was of the appropriate type. The alternative is to dump it on the
user's head - as one generally must with Unix OS's since files there tend to
be viewed as a stream of bytes. "I ask you for a UTF-8 input file and if you
give me a relational database file, well, that's your tough luck..."



>
> No.  But it would be nice, if one could demand that
> compilers can handle UTF-8 or raw 32-bit ISO-10646 encoded
> source files.
>
That sounds like an implementation issue. (You're talking about the Ada
compiler eating Ada source that is in UTF-8? No reason that can't be done
without a language revision.) Otherwise, I'd think you could provide all the
tools by creating a Wide_Wide_Character and Wide_Wide_String type and
providing all the customary packages that would involve. From there,
additional utility probably should come from a standard Ada library so that
it could be enhanced and extended without formal language revision.

MDC
--
======================================================================
Marin David Condic
I work for: http://www.belcan.com/
My project is: http://www.jast.mil/

Send Replies To: m c o n d i c @ a c m . o r g

    "I'd trade it all for just a little more"
        --  Charles Montgomery Burns, [4F10]
======================================================================






  reply	other threads:[~2002-12-01 14:38 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-11-28 17:53 Character Sets Robert C. Leif
2002-11-28 18:08 ` Character Sets (plain text police report) Warren W. Gay VE3WWG
2002-11-28 18:11   ` Warren W. Gay VE3WWG
2002-11-29 11:12     ` Lutz Donnerhacke
2002-11-29 14:58       ` Frank J. Lhota
2002-11-29 20:37   ` Robert C. Leif
2002-11-30 14:49     ` Marin David Condic
2002-12-01 11:28       ` Jacob Sparre Andersen
2002-12-01 14:38         ` Marin David Condic [this message]
2002-12-01 20:25           ` Jacob Sparre Andersen
2002-12-02  9:43             ` Preben Randhol
2002-12-02 13:26               ` Marin David Condic
2002-12-02  6:44           ` Robert C. Leif
2002-12-02  9:41           ` Preben Randhol
2002-12-02 16:58           ` Charles Lindsey
2002-12-02 19:29     ` A suggestion, completely unrelated to the original topic Wes Groleau
2002-12-02 23:21       ` David C. Hoos, Sr.
2002-11-29 12:28 ` Character Sets Georg Bauhaus
2002-12-02 18:28 ` Stephen Leake
2002-12-03  2:45   ` Robert C. Leif
2002-12-03 13:33     ` Robert A Duff
2002-12-03 15:32       ` Juanma Barranquero
2002-12-04  0:49       ` Robert C. Leif
2002-12-14  3:27         ` David Starner
2002-12-14 22:53           ` Vadim Godunko
2002-12-15  3:46             ` David Starner
2002-12-15 23:26             ` Robert C. Leif
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox