Re: Hebrew language character set

comp.lang.ada
 help / color / mirror / Atom feed

From: Britt Snodgrass <britt@adapower.net>
Subject: Re: Hebrew language character set
Date: Thu, 05 Apr 2001 13:27:04 -0500
Date: 2001-04-05T13:27:04-05:00	[thread overview]
Message-ID: <3ACCB8F8.C7153457@adapower.net> (raw)
In-Reply-To: 3ACCAA2D.523AA9B6@lmco.com


Paul Storm wrote:
> 
> I did a search of the gnat reference at
> http://lglwww.epfl.ch/docs/ada/gnat_ug.html#SEC128
> for "-gnatW8".  It produced no results.  I do see it as an option of the


The URL you used points to a very old/obsolete version of the GNAT
reference.  See the "Wide Character Encodings" section of a GNAT 3.13p
or 3.14a users manual. This manual should have been installed along with
your compiler.  I've never done this myself but I imagine the device
driver for the output device you're writing to would have to know how to
display the Hebrew character code.

From the GNAT 3.14a Users Guide:

Wide Character Encodings

GNAT allows wide character codes to appear in character and string
literals, and also optionally in identifiers, by means of the following
possible encoding schemes: 

Hex Coding 
     In this encoding, a wide character is represented by the following
five character sequence: 

     ESC a b c d

     Where a, b, c, d are the four hexadecimal characters (using
uppercase letters) of the wide character code. For example, ESC A345 is
used to represent the wide character with code 16#A345#. This
     scheme is compatible with use of the full Wide_Character set.
 
Upper-Half Coding 
     The wide character with encoding 16#abcd# where the upper bit is on
(in other words, "a" is in the range 8-F) is represented as two bytes,
16#ab# and 16#cd#. The second byte cannot be a format
     control character, but is not required to be in the upper half.
This method can be also used for shift-JIS or EUC, where the internal
coding matches the external coding. 

Shift JIS Coding 
     A wide character is represented by a two-character sequence, 16#ab#
and 16#cd#, with the restrictions described for upper-half encoding as
described above. The internal character code is the
     corresponding JIS character according to the standard algorithm for
Shift-JIS conversion. Only characters defined in the JIS code set table
can be used with this encoding method. 

EUC Coding 
     A wide character is represented by a two-character sequence 16#ab#
and 16#cd#, with both characters being in the upper half. The internal
character code is the corresponding JIS character
     according to the EUC encoding algorithm. Only characters defined in
the JIS code set table can be used with this encoding method. 
UTF-8 Coding
 
     A wide character is represented using UCS Transformation Format 8
(UTF-8) as defined in Annex R of ISO 10646-1/Am.2. Depending on the
character value, the representation is a one, two, or
     three byte sequence: 

     @leftskip=.7cm
     16#0000#-16#007f#: 2#0xxxxxxx#
     16#0080#-16#07ff#: 2#110xxxxx# 2#10xxxxxx#
     16#0800#-16#ffff#: 2#1110xxxx# 2#10xxxxxx# 2#10xxxxxx#


     where the xxx bits correspond to the left-padded bits of the 16-bit
character value. Note that all lower half ASCII characters are
represented as ASCII bytes and all upper half characters and other
     wide characters are represented as sequences of upper-half (The
full UTF-8 scheme allows for encoding 31-bit characters as 6-byte
sequences, but in this implementation, all UTF-8 sequences of
     four or more bytes length will be treated as illegal). 
Brackets Coding 
     In this encoding, a wide character is represented by the following
eight character sequence: 

     [ " a b c d " ]

     Where a, b, c, d are the four hexadecimal characters (using
uppercase letters) of the wide character code. For example, ["A345"] is
used to represent the wide character with code 16#A345#. It is
     also possible (though not required) to use the Brackets coding for
upper half characters. For example, the code 16#A3# can be represented
as ["A3"]. This scheme is compatible with use of the full
     Wide_Character set, and is also the method used for wide character
encoding in the standard ACVC (Ada Compiler Validation Capability) test
suite distributions. 

Note: Some of these coding schemes do not permit the full use of the Ada
95 character set. For example, neither Shift JIS, nor EUC allow the use
of the upper half of the Latin-1 set.

next prev parent reply	other threads:[~2001-04-05 18:27 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-04-03 19:08 Hebrew language character set Paul Storm
2001-04-03 19:42 ` Florian Weimer
2001-04-03 23:05   ` Paul Storm
2001-04-04  3:09     ` David Starner
2001-04-04  9:20     ` Florian Weimer
2001-04-04 17:35 ` David Botton
2001-04-04 19:26   ` Paul Storm
2001-04-04 21:36   ` Paul Storm
2001-04-05  3:03     ` David Starner
2001-04-05  6:42     ` Ehud Lamm
2001-04-05 16:46       ` Paul Storm
2001-04-05 13:11     ` Jean-Marc Bourguet
2001-04-05 16:56       ` Paul Storm
2001-04-05 16:41         ` Florian Weimer
2001-04-05 18:23           ` Paul Storm
2001-04-05 18:27             ` Britt Snodgrass [this message]
2001-04-05 20:43               ` David Starner
2001-04-06 21:28                 ` Florian Weimer
2001-04-05 18:38             ` Florian Weimer
2001-04-05 18:36           ` David Starner
2001-04-06 21:26             ` Florian Weimer
2001-04-05 18:41           ` Paul Storm
2001-04-06  9:32             ` Florian Weimer
2001-04-05 18:35         ` David Starner
2001-04-06 18:10           ` Ayende Rahien
2001-04-06 22:27             ` David Starner
2001-04-08 19:03               ` Robert A Duff
2001-04-07  5:12             ` Florian Weimer

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox