comp.lang.ada
 help / color / mirror / Atom feed
From: Adam Beneschan <adam@irvine.com>
Subject: Re: Wide_[Wide_]Character
Date: Tue, 22 Jul 2008 12:18:41 -0700 (PDT)
Date: 2008-07-22T12:18:41-07:00	[thread overview]
Message-ID: <072713a0-7c1f-4e29-a2e7-4f43a89f6ebf@m45g2000hsb.googlegroups.com> (raw)
In-Reply-To: MrNoSpam-A54511.17443812072008@news-server.bigpond.net.au

On Jul 12, 12:44 am, Dale Stanbrough <MrNoS...@bigpoop.net.au> wrote:
> Unicode can be represented using UTF-8, UTF-16 and UTF-32 (amongst
> others).
>
> I gather that Character is simply ISO-8859-1 (Latin-1).
>
> I suspect that Wide_Character is UCS-2 (simple 2 byte values, no escapes
> like UTF-16).
>
> Is Wide_Wide_Character
>
>    * UTF-16
>    * UTF-32 (i.e. UCS-4)
>    * System dependent
>    * Something else
>
> Thanks,
>
> Dale

I'm not convinced that the question makes sense.  Wide_Character
refers to an enumeration type with 2**16 literals, where
Wide_Charater'Val(N) denotes the corresponding character in the ISO
10646 Basic Multilingual Plane, i.e. Unicode.  Unicode is a
*character* *set*, i.e. a definition of what character corresponds to
each integer; it says nothing about how characters are represented.
Wide_Wide_Character is similarly an enumeration type with 2**32
literals.

When a sequence of characters is represented in internal memory, it's
up to an implementation to decide how to represent each character in
memory.  But in most cases, it makes no sense to represent it as
anything other than a flat array.  Thus, a Wide_String would be, in
essence, an array of 16-bit integers, and a Wide_Wide_String would be
an array of 32-bit integers.  If it were represented otherwise, how
could a program access, say, S(1000) where S is declared as a
Wide_Wide_String(1..2000)?  If it were represented as, say, UTF-8 or
UTF-16, the program would have to start at the beginning of the string
and do an expensive search every time it wanted to access one
particular character of the string.  This would not make sense.  So I
think that any implementation would implement those character (and
string) types as an integer (or array of integers), with whatever
endianness is most convenient for that processor.

When a sequence of characters is represented in a file (or is
communicated some other way e.g. over a socket), the characters may
well be encoded as UTF-8 or UTF-16 or something.  The language doesn't
define how different encodings are handled.  I believe GNAT uses the
"form" parameter when a file is opened or created to specify the
encoding; it supports a number of different possible encodings,
because different files that come from different places may be encoded
in different ways.  When a line is read from one of those files into
memory, though, I'm sure that the runtime will convert it to an
internal representation that is a flat array.

I'm not sure if this tells you what you need to know or not; if not,
then if you tell us why you're asking the question (i.e. what you want
to accomplish), this will give us a better idea of what we need to
tell you.  If you're trying to do some sort of overlay, where you read
in raw bytes from a file and then use Unchecked_Conversion or
something to convert it to a Wide_Wide_String, or something of that
nature, my advice is: Just don't do that.

P.S. I know I'm coming in late to this thread---I just got back from
vacation.  If your question has already been answered, my apologies.

                                -- Adam



      parent reply	other threads:[~2008-07-22 19:18 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-12  7:44 Wide_[Wide_]Character Dale Stanbrough
2008-07-12  8:11 ` Wide_[Wide_]Character Dmitry A. Kazakov
2008-07-12 11:00   ` Wide_[Wide_]Character Dale Stanbrough
2008-07-12 11:27     ` Wide_[Wide_]Character Peter C. Chapin
2008-07-12 12:25       ` Wide_[Wide_]Character Georg Bauhaus
2008-07-15 12:37         ` Wide_[Wide_]Character Dale Stanbrough
2008-07-15 14:06           ` Wide_[Wide_]Character Georg Bauhaus
2008-07-12 20:56     ` Wide_[Wide_]Character Dmitry A. Kazakov
2008-07-12 10:11 ` Wide_[Wide_]Character anon
2008-07-12 10:58   ` Wide_[Wide_]Character Dale Stanbrough
2008-07-13  1:38     ` Wide_[Wide_]Character anon
2008-07-22 19:18 ` Adam Beneschan [this message]
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox