Hebrew language character set

comp.lang.ada
 help / color / mirror / Atom feed

* Hebrew language character set
@ 2001-04-03 19:08 Paul Storm
  2001-04-03 19:42 ` Florian Weimer
  2001-04-04 17:35 ` David Botton
  0 siblings, 2 replies; 28+ messages in thread
From: Paul Storm @ 2001-04-03 19:08 UTC (permalink / raw)


I'm a newbie to Ada.  As an exercise I wanted to write a simple Ada
program that would print out the Hebrew alphabet, from right to
left(as it would be written by hand).

Question: How do I select the Hebrew Character set?  I assume it is
something like Ada.Characters.Hebrew but I don't see a ready
reference in the RM95.  Of course I haven't thoroughly looked either.
I just figured I would do a quick post and see if any of you sharp
people knew off the top of your head.  Hey, I'm lazy. :-) Besides
this is a lark and not something I wanted to dump time into, ya know
what I mean?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-03 19:08 Hebrew language character set Paul Storm
@ 2001-04-03 19:42 ` Florian Weimer
  2001-04-03 23:05   ` Paul Storm
  2001-04-04 17:35 ` David Botton
  1 sibling, 1 reply; 28+ messages in thread
From: Florian Weimer @ 2001-04-03 19:42 UTC (permalink / raw)


Paul Storm <paul.a.storm@lmco.com> writes:

> Question: How do I select the Hebrew Character set?

A reasonably portable solution would involve the Hebrew Unicode
characters provided by Wide_Character.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-03 19:42 ` Florian Weimer
@ 2001-04-03 23:05   ` Paul Storm
  2001-04-04  3:09     ` David Starner
  2001-04-04  9:20     ` Florian Weimer
  0 siblings, 2 replies; 28+ messages in thread
From: Paul Storm @ 2001-04-03 23:05 UTC (permalink / raw)


Yes, I figured it would be using wide characters.  But which offset?
I know that standard Latin_1 offset is 00 in wide characters for the
ISO 8859-1.  But what is the offset for the Hebrew set?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-03 23:05   ` Paul Storm
@ 2001-04-04  3:09     ` David Starner
  2001-04-04  9:20     ` Florian Weimer
  1 sibling, 0 replies; 28+ messages in thread
From: David Starner @ 2001-04-04  3:09 UTC (permalink / raw)


On Tue, 03 Apr 2001 15:05:22 -0800, Paul Storm <paul.a.storm@lmco.com> wrote:
>Yes, I figured it would be using wide characters.  But which offset?
>I know that standard Latin_1 offset is 00 in wide characters for the
>ISO 8859-1.  But what is the offset for the Hebrew set?

See www.unicode.org . They have all the tables for it.

-- 
David Starner - dstarner98@aasaa.ofe.org
Pointless website: http://dvdeug.dhis.org
"I don't care if Bill personally has my name and reads my email and 
laughs at me. In fact, I'd be rather honored." - Joseph_Greg



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-03 23:05   ` Paul Storm
  2001-04-04  3:09     ` David Starner
@ 2001-04-04  9:20     ` Florian Weimer
  1 sibling, 0 replies; 28+ messages in thread
From: Florian Weimer @ 2001-04-04  9:20 UTC (permalink / raw)


Paul Storm <paul.a.storm@lmco.com> writes:

> Yes, I figured it would be using wide characters.  But which offset?
> I know that standard Latin_1 offset is 00 in wide characters for the
> ISO 8859-1.  But what is the offset for the Hebrew set?

Have a look at one of the following documents:

        http://charts.unicode.org/Web/U0590.html
        http://www.unicode.org/charts/PDF/U0590.pdf

I don't think it's just a question of the correct offset.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-03 19:08 Hebrew language character set Paul Storm
  2001-04-03 19:42 ` Florian Weimer
@ 2001-04-04 17:35 ` David Botton
  2001-04-04 19:26   ` Paul Storm
  2001-04-04 21:36   ` Paul Storm
  1 sibling, 2 replies; 28+ messages in thread
From: David Botton @ 2001-04-04 17:35 UTC (permalink / raw)
  To: comp.lang.ada

What operating system are you using? There are a number of ways to do this.

For example, If you are running under a dos or Linux terminal with the
Hebrew font loaded you would use ascii values above 128.

David Botton

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-04 17:35 ` David Botton
@ 2001-04-04 19:26   ` Paul Storm
  2001-04-04 21:36   ` Paul Storm
  1 sibling, 0 replies; 28+ messages in thread
From: Paul Storm @ 2001-04-04 19:26 UTC (permalink / raw)


I have access to both NT and Sun Unix.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-04 17:35 ` David Botton
  2001-04-04 19:26   ` Paul Storm
@ 2001-04-04 21:36   ` Paul Storm
  2001-04-05  3:03     ` David Starner
                       ` (2 more replies)
  1 sibling, 3 replies; 28+ messages in thread
From: Paul Storm @ 2001-04-04 21:36 UTC (permalink / raw)


For those that are interested, I produce code that (I think) produces
a Hebrew character.  Here is the code:

with Ada.Characters.Handling; use Ada.Characters.Handling;
with Ada.Wide_Text_IO; use Ada.Wide_Text_IO;

procedure aleph is
begin
  Ada.Wide_Text_IO.Put (Item => Wide_Character'Val(1488));
end aleph;

end of code

Here is the output.

["05D0"]

end of output

1448 decimal is 05D0 hexidecimal.

I said think produces.  I am thinking that my display showed the
character as a code due to the lack of support for that character
(set) on my system.  Can anyone confirm this for me?  Does that make
sense?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-04 21:36   ` Paul Storm
@ 2001-04-05  3:03     ` David Starner
  2001-04-05  6:42     ` Ehud Lamm
  2001-04-05 13:11     ` Jean-Marc Bourguet
  2 siblings, 0 replies; 28+ messages in thread
From: David Starner @ 2001-04-05  3:03 UTC (permalink / raw)


On Wed, 04 Apr 2001 13:36:47 -0800, Paul Storm <paul.a.storm@lmco.com> wrote:
>For those that are interested, I produce code that (I think) produces
>a Hebrew character.  Here is the code:
>
>with Ada.Characters.Handling; use Ada.Characters.Handling;
>with Ada.Wide_Text_IO; use Ada.Wide_Text_IO;
>
>procedure aleph is
>begin
>  Ada.Wide_Text_IO.Put (Item => Wide_Character'Val(1488));
>end aleph;
>
>end of code
>
>Here is the output.
>
>["05D0"]
>
>end of output
>
>1448 decimal is 05D0 hexidecimal.
>
>I said think produces.  I am thinking that my display showed the
>character as a code due to the lack of support for that character
>(set) on my system.  Can anyone confirm this for me?  Does that make
>sense?

Nope. It looks like your Ada system uses a ["abcd"] encoding by
default, since it has no way to know what the correct encoding
should be. If you're using GNAT, the GNAT RM has a section about
wide text IO and changing it to output in different encodings.
If you have a recent version (>4.0) of XFree86, you can run 
xterm -u8 which will properly understand UTF-8; otherwise, there's
probably no way to get readable Hebrew output.

-- 
David Starner - dstarner98@aasaa.ofe.org
Pointless website: http://dvdeug.dhis.org
"I don't care if Bill personally has my name and reads my email and 
laughs at me. In fact, I'd be rather honored." - Joseph_Greg



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-04 21:36   ` Paul Storm
  2001-04-05  3:03     ` David Starner
@ 2001-04-05  6:42     ` Ehud Lamm
  2001-04-05 16:46       ` Paul Storm
  2001-04-05 13:11     ` Jean-Marc Bourguet
  2 siblings, 1 reply; 28+ messages in thread
From: Ehud Lamm @ 2001-04-05  6:42 UTC (permalink / raw)


Paul Storm <paul.a.storm@lmco.com> wrote in message
news:3ACB85DF.9E6DBD03@lmco.com...
> For those that are interested, I produce code that (I think) produces
> a Hebrew character.  Here is the code:
>
> with Ada.Characters.Handling; use Ada.Characters.Handling;
> with Ada.Wide_Text_IO; use Ada.Wide_Text_IO;
>
> procedure aleph is
> begin
>   Ada.Wide_Text_IO.Put (Item => Wide_Character'Val(1488));
> end aleph;
>
> end of code
>
> Here is the output.
>
> ["05D0"]
>
> end of output
>

That's the output I get (the string literal, not the Hebrew echaracter) on
my PC, which has Hebrew support, Hebre enable windows etc. (I am in Israel).


Ehud





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-04 21:36   ` Paul Storm
  2001-04-05  3:03     ` David Starner
  2001-04-05  6:42     ` Ehud Lamm
@ 2001-04-05 13:11     ` Jean-Marc Bourguet
  2001-04-05 16:56       ` Paul Storm
  2 siblings, 1 reply; 28+ messages in thread
From: Jean-Marc Bourguet @ 2001-04-05 13:11 UTC (permalink / raw)


Paul Storm wrote:
> 
> For those that are interested, I produce code that (I think) produces
> a Hebrew character.  Here is the code:
> 
> with Ada.Characters.Handling; use Ada.Characters.Handling;
> with Ada.Wide_Text_IO; use Ada.Wide_Text_IO;
> 
> procedure aleph is
> begin
>   Ada.Wide_Text_IO.Put (Item => Wide_Character'Val(1488));
> end aleph;
> 
> end of code
> 
> Here is the output.
> 
> ["05D0"]
> 
> end of output
> 
> 1448 decimal is 05D0 hexidecimal.
> 
> I said think produces.  I am thinking that my display showed the
> character as a code due to the lack of support for that character
> (set) on my system.  Can anyone confirm this for me?  Does that make
> sense?

I think this is one of the way gnat produces (and accept) wide
character.  You should be able to choose the format you want with the
FORM parameter.  Check the reference manual, there is a whole section on
wide characters.

-- Jean-Marc



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-05 16:56       ` Paul Storm
@ 2001-04-05 16:41         ` Florian Weimer
  2001-04-05 18:23           ` Paul Storm
                             ` (2 more replies)
  2001-04-05 18:35         ` David Starner
  1 sibling, 3 replies; 28+ messages in thread
From: Florian Weimer @ 2001-04-05 16:41 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=iso-8859-8, Size: 574 bytes --]

Paul Storm <paul.a.storm@lmco.com> writes:

> Jean-Marc,
> 
> Of course I RTFM first.  I said I was a newbie to Ada, not a jerk. (lol)
> Below is the entire section on Wide Text Input-Output(A.11) from the RM.
> It is not very useful, IMHO.  Frankly, I thought there would be more
> after hearing of Ada's robust language definition.  Now I'm not so sure.

$ gnatmake -gnatW8 aleph.adb 
$ ./aleph 
�
$ 

You were reading the wrong documentation.  Bothe GNAT User Guide and
the GNAT Reference Manual contain valuable information (not just on
this issue, but in general).



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-05  6:42     ` Ehud Lamm
@ 2001-04-05 16:46       ` Paul Storm
  0 siblings, 0 replies; 28+ messages in thread
From: Paul Storm @ 2001-04-05 16:46 UTC (permalink / raw)


Now I don't know what to think.  I thought having Hebrew character
on your machine would do it.  Now I'm wondering if there is something
else that is required.  Like Hebrew character support installed in
the Ada environment(?).

Does anybody know how to make this work?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-05 13:11     ` Jean-Marc Bourguet
@ 2001-04-05 16:56       ` Paul Storm
  2001-04-05 16:41         ` Florian Weimer
  2001-04-05 18:35         ` David Starner
  0 siblings, 2 replies; 28+ messages in thread
From: Paul Storm @ 2001-04-05 16:56 UTC (permalink / raw)


Jean-Marc,

Of course I RTFM first.  I said I was a newbie to Ada, not a jerk. (lol)
Below is the entire section on Wide Text Input-Output(A.11) from the RM.
It is not very useful, IMHO.  Frankly, I thought there would be more
after hearing of Ada's robust language definition.  Now I'm not so sure.

Apparently this is not the trivial problem I thought it was.  As a Ada
newbie it makes me wonder about Ada's capabilities for program
internationalization.  In our internet age that is important.

Paul Storm

A.11 Wide Text Input-Output

The package Wide_Text_IO provides facilities for input and output in
human-readable form. Each file is read or written sequentially, as a
sequence of wide characters grouped into lines, and as a sequence of
lines grouped into pages. 

                                Static Semantics

The specification of package Wide_Text_IO is the same as that for
Text_IO, except that in each Get,
Look_Ahead, Get_Immediate, Get_Line, Put, and Put_Line procedure, any
occurrence of Character is replaced by Wide_Character, and any
occurrence of String is replaced by Wide_String. 

Nongeneric equivalents of Wide_Text_IO.Integer_IO and
Wide_Text_IO.Float_IO are provided (as for Text_IO) for each predefined
numeric type, with names such as Ada.Integer_Wide_Text_IO,
Ada.Long_Integer_Wide_Text_IO, Ada.Float_Wide_Text_IO,
Ada.Long_Float_Wide_Text_IO.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-05 16:41         ` Florian Weimer
@ 2001-04-05 18:23           ` Paul Storm
  2001-04-05 18:27             ` Britt Snodgrass
  2001-04-05 18:38             ` Florian Weimer
  2001-04-05 18:36           ` David Starner
  2001-04-05 18:41           ` Paul Storm
  2 siblings, 2 replies; 28+ messages in thread
From: Paul Storm @ 2001-04-05 18:23 UTC (permalink / raw)


??

I copied what you did and saw no difference(see below).  Same thing on
the Sun.  Did I miss something?  Do you have some other environmental
setting that accounts for the different outputs(yours and mine)?

I did a search of the gnat reference at
http://lglwww.epfl.ch/docs/ada/gnat_ug.html#SEC128
for "-gnatW8".  It produced no results.  I do see it as an option of the
gnatmake command.
Entering "gnatmake" without options shows it in the list of options as,
"-gnatW    Wide character encoding method (h/u/s/e/8/b)". grrrrrrr

Paul Storm

D:\TEMP>gnatmake -gnatW8 aleph.adb
gnatbind -x aleph.ali
gnatlink aleph.ali

D:\TEMP>aleph.exe
["05D0"]



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-05 18:23           ` Paul Storm
@ 2001-04-05 18:27             ` Britt Snodgrass
  2001-04-05 20:43               ` David Starner
  2001-04-05 18:38             ` Florian Weimer
  1 sibling, 1 reply; 28+ messages in thread
From: Britt Snodgrass @ 2001-04-05 18:27 UTC (permalink / raw)



Paul Storm wrote:
> 
> I did a search of the gnat reference at
> http://lglwww.epfl.ch/docs/ada/gnat_ug.html#SEC128
> for "-gnatW8".  It produced no results.  I do see it as an option of the


The URL you used points to a very old/obsolete version of the GNAT
reference.  See the "Wide Character Encodings" section of a GNAT 3.13p
or 3.14a users manual. This manual should have been installed along with
your compiler.  I've never done this myself but I imagine the device
driver for the output device you're writing to would have to know how to
display the Hebrew character code.

From the GNAT 3.14a Users Guide:

Wide Character Encodings

GNAT allows wide character codes to appear in character and string
literals, and also optionally in identifiers, by means of the following
possible encoding schemes: 

Hex Coding 
     In this encoding, a wide character is represented by the following
five character sequence: 

     ESC a b c d

     Where a, b, c, d are the four hexadecimal characters (using
uppercase letters) of the wide character code. For example, ESC A345 is
used to represent the wide character with code 16#A345#. This
     scheme is compatible with use of the full Wide_Character set.
 
Upper-Half Coding 
     The wide character with encoding 16#abcd# where the upper bit is on
(in other words, "a" is in the range 8-F) is represented as two bytes,
16#ab# and 16#cd#. The second byte cannot be a format
     control character, but is not required to be in the upper half.
This method can be also used for shift-JIS or EUC, where the internal
coding matches the external coding. 

Shift JIS Coding 
     A wide character is represented by a two-character sequence, 16#ab#
and 16#cd#, with the restrictions described for upper-half encoding as
described above. The internal character code is the
     corresponding JIS character according to the standard algorithm for
Shift-JIS conversion. Only characters defined in the JIS code set table
can be used with this encoding method. 

EUC Coding 
     A wide character is represented by a two-character sequence 16#ab#
and 16#cd#, with both characters being in the upper half. The internal
character code is the corresponding JIS character
     according to the EUC encoding algorithm. Only characters defined in
the JIS code set table can be used with this encoding method. 
UTF-8 Coding
 
     A wide character is represented using UCS Transformation Format 8
(UTF-8) as defined in Annex R of ISO 10646-1/Am.2. Depending on the
character value, the representation is a one, two, or
     three byte sequence: 

     @leftskip=.7cm
     16#0000#-16#007f#: 2#0xxxxxxx#
     16#0080#-16#07ff#: 2#110xxxxx# 2#10xxxxxx#
     16#0800#-16#ffff#: 2#1110xxxx# 2#10xxxxxx# 2#10xxxxxx#


     where the xxx bits correspond to the left-padded bits of the 16-bit
character value. Note that all lower half ASCII characters are
represented as ASCII bytes and all upper half characters and other
     wide characters are represented as sequences of upper-half (The
full UTF-8 scheme allows for encoding 31-bit characters as 6-byte
sequences, but in this implementation, all UTF-8 sequences of
     four or more bytes length will be treated as illegal). 
Brackets Coding 
     In this encoding, a wide character is represented by the following
eight character sequence: 

     [ " a b c d " ]

     Where a, b, c, d are the four hexadecimal characters (using
uppercase letters) of the wide character code. For example, ["A345"] is
used to represent the wide character with code 16#A345#. It is
     also possible (though not required) to use the Brackets coding for
upper half characters. For example, the code 16#A3# can be represented
as ["A3"]. This scheme is compatible with use of the full
     Wide_Character set, and is also the method used for wide character
encoding in the standard ACVC (Ada Compiler Validation Capability) test
suite distributions. 

Note: Some of these coding schemes do not permit the full use of the Ada
95 character set. For example, neither Shift JIS, nor EUC allow the use
of the upper half of the Latin-1 set.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-05 16:56       ` Paul Storm
  2001-04-05 16:41         ` Florian Weimer
@ 2001-04-05 18:35         ` David Starner
  2001-04-06 18:10           ` Ayende Rahien
  1 sibling, 1 reply; 28+ messages in thread
From: David Starner @ 2001-04-05 18:35 UTC (permalink / raw)

On Thu, 05 Apr 2001 08:56:07 -0800, Paul Storm <paul.a.storm@lmco.com> wrote:
>Apparently this is not the trivial problem I thought it was.  As a Ada
>newbie it makes me wonder about Ada's capabilities for program
>internationalization.  In our internet age that is important.

Try this in C. (1) It's impossible to use Hebrew characters, since
wchar_t is opaque - you can only portably use ASCII from inside the
code. (2) If you just stick the Unicode Hebrew values in there, it
might work (since wchar_t is often some form of Unicode), but it
probably will print out UTF-8 or UTF-16, which will look like noise
on your screen or even Ehud Lamm's system. The encouraged solution
in C is to store all non-ASCII characters external to the program
and treat them opaquely, if at all. The correct solutions if you
need more are hottly debated.

I'm sorry - you picked a hard problem in any programming language.
I was about to blame the Americans (who invented ASCII and other
English-only, 7/8-bit codes, and spread them round the world), but
equal blame falls on the Japenese (who use three differnt codes, and
oppose one unified code), the Europeans (who hardcoded 8-bit codes
and especially Latin-1 everywhere), and generally anyone who
invented a solution that worked for thier language and quit. There's
no way to output a non-ASCII character and expect that it will work
in any more than a small subset of places.

There's no way for any language to portably guess what the
appropriate encoding of output would be. GNAT could possibly do
better, and hopefully I will find time to write up a decent proposal
on how it could do better, but the current solution differs little
from what almost any programming language would do.

-- 
David Starner - dstarner98@aasaa.ofe.org
Pointless website: http://dvdeug.dhis.org
"I don't care if Bill personally has my name and reads my email and 
laughs at me. In fact, I'd be rather honored." - Joseph_Greg

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-05 16:41         ` Florian Weimer
  2001-04-05 18:23           ` Paul Storm
@ 2001-04-05 18:36           ` David Starner
  2001-04-06 21:26             ` Florian Weimer
  2001-04-05 18:41           ` Paul Storm
  2 siblings, 1 reply; 28+ messages in thread
From: David Starner @ 2001-04-05 18:36 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 632 bytes --]

On 05 Apr 2001 18:41:53 +0200, Florian Weimer <fw@deneb.enyo.de> wrote:
>$ gnatmake -gnatW8 aleph.adb 
>$ ./aleph 
>�
>$ 
>
>You were reading the wrong documentation.  Bothe GNAT User Guide and
>the GNAT Reference Manual contain valuable information (not just on
>this issue, but in general).

-gnatw8 changes the representation of the source, not the output. 
The GNAT RM tells you how to change the output. 

-- 
David Starner - dstarner98@aasaa.ofe.org
Pointless website: http://dvdeug.dhis.org
"I don't care if Bill personally has my name and reads my email and 
laughs at me. In fact, I'd be rather honored." - Joseph_Greg



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-05 18:23           ` Paul Storm
  2001-04-05 18:27             ` Britt Snodgrass
@ 2001-04-05 18:38             ` Florian Weimer
  1 sibling, 0 replies; 28+ messages in thread
From: Florian Weimer @ 2001-04-05 18:38 UTC (permalink / raw)


Paul Storm <paul.a.storm@lmco.com> writes:

> D:\TEMP>gnatmake -gnatW8 aleph.adb
> gnatbind -x aleph.ali
> gnatlink aleph.ali
> 
> D:\TEMP>aleph.exe
> ["05D0"]

The gnatmake command is probably reusing an old object file because
there's no trace of a GCC invocation.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-05 16:41         ` Florian Weimer
  2001-04-05 18:23           ` Paul Storm
  2001-04-05 18:36           ` David Starner
@ 2001-04-05 18:41           ` Paul Storm
  2001-04-06  9:32             ` Florian Weimer
  2 siblings, 1 reply; 28+ messages in thread
From: Paul Storm @ 2001-04-05 18:41 UTC (permalink / raw)


I modified my program as follows, [begin code]

with Ada.Characters.Handling; use Ada.Characters.Handling;
with Ada.Wide_Text_IO; use Ada.Wide_Text_IO;

procedure aleph is
begin
  Ada.Wide_Text_IO.Put (Item => Wide_Character'Val(16#0590#));
end aleph;

[end code]

Now I get the following results,

D:\TEMP>gnatmake -gnatW8 aleph.adb
gcc -c -gnatW8 aleph.adb
gnatbind -x aleph.ali
gnatlink aleph.ali

D:\TEMP>aleph.exe
+ï¿½

I made some mistake in coding my character in decimal instead of hex??

I found this in the GNAT User's Guide, btw,

-gnatWe 
Specify the method of encoding for wide characters. e is one of the
following: 
h Hex encoding (brackets coding also recognized) 
u Upper half encoding (brackets encoding also recognized) 
s Shift/JIS encoding (brackets encoding also recognized) 
e EUC encoding (brackets encoding also recognized) 
8 UTF-8 encoding (brackets encoding also recognized) 
b Brackets encoding only (default value) 

For full details on the these encoding methods see See section Wide
Character Encodings. Note that brackets coding is always accepted, even
if one of the other options is specified, so for example -gnatW8
specifies that both brackets and UTF-8 encodings will be recognized. The
units that are with'ed directly or indirectly will be scanned using the
specified representation scheme, and so if one of the non-brackets
scheme is used, it must be used consistently throughout the program.
However, since brackets encoding is always recognized, it may be
conveniently used in standard libraries, allowing these libraries to be
used with any of the available coding schemes. scheme. If no -gnatW?
parameter is present, then the default representation is Brackets
encoding only. Note that the wide character representation that is
specified (explicitly or by default) for the main program also acts as
the default encoding used for Wide_Text_IO files if not specifically
overridden by a WCEM form parameter. 

Paul Storm



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-05 18:27             ` Britt Snodgrass
@ 2001-04-05 20:43               ` David Starner
  2001-04-06 21:28                 ` Florian Weimer
  0 siblings, 1 reply; 28+ messages in thread
From: David Starner @ 2001-04-05 20:43 UTC (permalink / raw)


On Thu, 05 Apr 2001 13:27:04 -0500, Britt Snodgrass <britt@adapower.net> wrote:
>The URL you used points to a very old/obsolete version of the GNAT
>reference.  See the "Wide Character Encodings" section of a GNAT 3.13p
>or 3.14a users manual. 
[...]
>From the GNAT 3.14a Users Guide:
>
>Wide Character Encodings
>
>GNAT allows wide character codes to appear in character and string
>literals, and also optionally in identifiers, by means of the following
>possible encoding schemes: 

"in ... literals, and ... in identifiers" clearly shows that this 
section of the GNAT User Guide is talking about the representation
of the source, not I/O. Try

Wide_Text_IO
============

`Wide_Text_IO' is similar in most respects to Text_IO, except that both
input and output files may contain special sequences that represent
wide character values. The encoding scheme for a given file may be
specified using a FORM parameter:

     WCEM=X

as part of the FORM string (WCEM = wide character encoding method),
where X is one of the following characters

`h'
     Hex ESC encoding

`u'
     Upper half encoding

`s'
     Shift-JIS encoding

`e'
     EUC Encoding

`8'
     UTF-8 encoding

`b'
     Brackets encoding

   The encoding methods match those that can be used in a source
program, but there is no requirement that the encoding method used for
the source program be the same as the encoding method used for files,
and different files may use different encoding methods.

   The default encoding method for the standard files, and for opened
files for which no WCEM parameter is given in the FORM string matches
the wide character encoding specified for the main program (the default
being brackets encoding if no coding method was specified with -gnatW).

Hex Coding
     In this encoding, a wide character is represented by a five
     character sequence:

          ESC a b c d

     where A, B, C, D are the four hexadecimal characters (using upper
     case letters) of the wide character code. For example, ESC A345 is
     used to represent the wide character with code 16#A345#. This
     scheme is compatible with use of the full `Wide_Character' set.

Upper Half Coding
     The wide character with encoding 16#abcd#, where the upper bit is
     on (i.e. a is in the range 8-F) is represented as two bytes 16#ab#
     and 16#cd#. The second byte may never be a format control
     character, but is not required to be in the upper half. This
     method can be also used for shift-JIS or EUC where the internal
     coding matches the external coding.

Shift JIS Coding
     A wide character is represented by a two character sequence 16#ab#
     and 16#cd#, with the restrictions described for upper half
     encoding as described above. The internal character code is the
     corresponding JIS character according to the standard algorithm
     for Shift-JIS conversion. Only characters defined in the JIS code
     set table can be used with this encoding method.

EUC Coding
     A wide character is represented by a two character sequence 16#ab#
     and 16#cd#, with both characters being in the upper half. The
     internal character code is the corresponding JIS character
     according to the EUC encoding algorithm. Only characters defined
     in the JIS code set table can be used with this encoding method.

UTF-8 Coding
     A wide character is represented using UCS Transformation Format 8
     (UTF-8) as defined in Annex R of ISO 10646-1/Am.2.  Depending on
     the character value, the representation is a one, two, or three
     byte sequence:

          16#0000#-16#007f#: 2#0xxxxxxx#
          16#0080#-16#07ff#: 2#110xxxxx# 2#10xxxxxx#
          16#0800#-16#ffff#: 2#1110xxxx# 2#10xxxxxx# 2#10xxxxxx#

     where the xxx bits correspond to the left-padded bits of the the
     16-bit character value. Note that all lower half ASCII characters
     are represented as ASCII bytes and all upper half characters and
     other wide characters are represented as sequences of upper-half
     (The full UTF-8 scheme allows for encoding 31-bit characters as
     6-byte sequences, but in this implementation, all UTF-8 sequences
     of four or more bytes length will raise a Constraint_Error, as
     will all illegal UTF-8 sequences.)

Brackets Coding
     In this encoding, a wide character is represented by the following
     eight character sequence:

          [ " a b c d " ]
     Where `a', `b', `c', `d' are the four hexadecimal characters
     (using uppercase letters) of the wide character code. For example,
     `["A345"]' is used to represent the wide character with code
     `16#A345#'.  This scheme is compatible with use of the full
     Wide_Character set.  On input, brackets coding can also be used
     for upper half characters, e.g. `["C1"]' for lower case a.
     However, on output, brackets notation is only used for wide
     characters with a code greater than `16#FF#'.

   For the coding schemes other than Hex and Brackets encoding, not all
wide character values can be represented. An attempt to output a
character that cannot be represented using the encoding scheme for the
file causes Constraint_Error to be raised. An invalid wide character
sequence on input also causes Constraint_Error to be raised.

[...]
======================================================================

(I must, however, apologize to the person to used -gnatW8 to change the
output, and I claimed it only changed source encoding. It's clear I 
didn't read this section well enough, because, for better or worse,
it changes the default encoding. It must get interesting when the program 
is compiled with different default encodings for each file, though ...)

-- 
David Starner - dstarner98@aasaa.ofe.org
Pointless website: http://dvdeug.dhis.org
"I don't care if Bill personally has my name and reads my email and 
laughs at me. In fact, I'd be rather honored." - Joseph_Greg



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-05 18:41           ` Paul Storm
@ 2001-04-06  9:32             ` Florian Weimer
  0 siblings, 0 replies; 28+ messages in thread
From: Florian Weimer @ 2001-04-06  9:32 UTC (permalink / raw)

Paul Storm <paul.a.storm@lmco.com> writes:

> Now I get the following results,
> 
> D:\TEMP>gnatmake -gnatW8 aleph.adb
> gcc -c -gnatW8 aleph.adb

The main program is recompiled and the -gnatW8 takes effect, that's
why you get a different output this time.

> gnatbind -x aleph.ali
> gnatlink aleph.ali
> 
> D:\TEMP>aleph.exe
> +ï¿½

Your terminal is not UTF-8 capable.  Use a terminal which is.

> I made some mistake in coding my character in decimal instead of hex??

No, this doesn't have any effect at all.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-05 18:35         ` David Starner
@ 2001-04-06 18:10           ` Ayende Rahien
  2001-04-06 22:27             ` David Starner
  2001-04-07  5:12             ` Florian Weimer
  0 siblings, 2 replies; 28+ messages in thread
From: Ayende Rahien @ 2001-04-06 18:10 UTC (permalink / raw)



"David Starner" <dvdeug@x8b4e53cd.dhcp.okstate.edu> wrote in message
news:9aidu0$9281@news.cis.okstate.edu...
> the Japenese (who use three differnt codes, and
> oppose one unified code

Why do they oppose a unified code?





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-05 18:36           ` David Starner
@ 2001-04-06 21:26             ` Florian Weimer
  0 siblings, 0 replies; 28+ messages in thread
From: Florian Weimer @ 2001-04-06 21:26 UTC (permalink / raw)


dvdeug@x8b4e53cd.dhcp.okstate.edu (David Starner) writes:

> >You were reading the wrong documentation.  Bothe GNAT User Guide and
> >the GNAT Reference Manual contain valuable information (not just on
> >this issue, but in general).
> 
> -gnatw8 changes the representation of the source, not the output. 
> The GNAT RM tells you how to change the output. 

According to my version of the GNAT RM, this is not correct, and the
GNAT RM is consistent with my observations.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-05 20:43               ` David Starner
@ 2001-04-06 21:28                 ` Florian Weimer
  0 siblings, 0 replies; 28+ messages in thread
From: Florian Weimer @ 2001-04-06 21:28 UTC (permalink / raw)


dvdeug@x8b4e53cd.dhcp.okstate.edu (David Starner) writes:

>    The default encoding method for the standard files, and for opened
> files for which no WCEM parameter is given in the FORM string matches
> the wide character encoding specified for the main program (the default
> being brackets encoding if no coding method was specified with -gnatW).

> It must get interesting when the program is compiled with different
> default encodings for each file, though ...)

There's only a single main program, so this is not a problem, I guess.
(I don't know what happens in a distributed system, though)



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-06 18:10           ` Ayende Rahien
@ 2001-04-06 22:27             ` David Starner
  2001-04-08 19:03               ` Robert A Duff
  2001-04-07  5:12             ` Florian Weimer
  1 sibling, 1 reply; 28+ messages in thread
From: David Starner @ 2001-04-06 22:27 UTC (permalink / raw)

On Fri, 6 Apr 2001 20:10:28 +0200, Ayende Rahien <Dont@spam.me> wrote:
> "David Starner" <dvdeug@x8b4e53cd.dhcp.okstate.edu> wrote in message
> news:9aidu0$9281@news.cis.okstate.edu...
>> the Japenese (who use three differnt codes, and
>> oppose one unified code
> 
> Why do they oppose a unified code?

It's not so much they oppose a unified code, as they oppose Unicode.
Chinese and Japenese use slightly different styles of characters
that were unified by Unicode. (It helps that they already use 2-byte
characters, so they aren't hurt by missing characters as much.)
There is a Japenese "universal" character set, TRON, but it's a
direct rip-off of Unicode for non-CJK characters, much less
carefully designed for CJK than Unicode, and controlled by a
Japenese group, so the rest of the world doesn't want anything to do
with it. Also, a lot of Japenese seem to think that a large number
of characters sets each controlled by a national body is a better
idea than one controlled by consortium / international standards
body.

Ob-Ada: Why does Ada have Latin-1 as Character and Unicode for
Wide_Character? Considering that most Ada systems don't go through
the trouble to get it right on non-Latin-1/non-Unicode systems, and
lot of data in Character and Wide_Character is the local encodings,
it would have been better to adopt an opaque encoding like C did and
deal with the difficulties.

-- 
David Starner - dstarner98@aasaa.ofe.org
Pointless website: http://dvdeug.dhis.org
"I don't care if Bill personally has my name and reads my email and 
laughs at me. In fact, I'd be rather honored." - Joseph_Greg

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-06 18:10           ` Ayende Rahien
  2001-04-06 22:27             ` David Starner
@ 2001-04-07  5:12             ` Florian Weimer
  1 sibling, 0 replies; 28+ messages in thread
From: Florian Weimer @ 2001-04-07  5:12 UTC (permalink / raw)

"Ayende Rahien" <Dont@spam.me> writes:

> "David Starner" <dvdeug@x8b4e53cd.dhcp.okstate.edu> wrote in message
> news:9aidu0$9281@news.cis.okstate.edu...
> > the Japenese (who use three differnt codes, and
> > oppose one unified code
> 
> Why do they oppose a unified code?

I don't know why they oppose a national unification project, but they
oppose Unicode because in the first version of the standard, the
glyphs which were unified with the Chinese variants were printed in
Chinese style only.  Since then, Japanese people tend to vigorously
campaign against CJK unification, claiming that it results in
readability problems and so on.  Surprisingly, Chinese and Korean
people seem to have less problems with unification.

Of course, the solution is not a Unicode version without CJK
unification, but a proper Unicode font featuring the Japanese version
of the glyphs.  This might result in the wrong version of the glyphs
displayed in Chinese portions, but even the Chinese don't seem to be
bothered by this.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Hebrew language character set
  2001-04-06 22:27             ` David Starner
@ 2001-04-08 19:03               ` Robert A Duff
  0 siblings, 0 replies; 28+ messages in thread
From: Robert A Duff @ 2001-04-08 19:03 UTC (permalink / raw)


dvdeug@x8b4e53cd.dhcp.okstate.edu (David Starner) writes:

> Ob-Ada: Why does Ada have Latin-1 as Character and Unicode for
> Wide_Character?

Note that in Ada 83, Character was 7-bit ASCII (and there was no
Wide_Character).

- Bob



^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2001-04-08 19:03 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-04-03 19:08 Hebrew language character set Paul Storm
2001-04-03 19:42 ` Florian Weimer
2001-04-03 23:05   ` Paul Storm
2001-04-04  3:09     ` David Starner
2001-04-04  9:20     ` Florian Weimer
2001-04-04 17:35 ` David Botton
2001-04-04 19:26   ` Paul Storm
2001-04-04 21:36   ` Paul Storm
2001-04-05  3:03     ` David Starner
2001-04-05  6:42     ` Ehud Lamm
2001-04-05 16:46       ` Paul Storm
2001-04-05 13:11     ` Jean-Marc Bourguet
2001-04-05 16:56       ` Paul Storm
2001-04-05 16:41         ` Florian Weimer
2001-04-05 18:23           ` Paul Storm
2001-04-05 18:27             ` Britt Snodgrass
2001-04-05 20:43               ` David Starner
2001-04-06 21:28                 ` Florian Weimer
2001-04-05 18:38             ` Florian Weimer
2001-04-05 18:36           ` David Starner
2001-04-06 21:26             ` Florian Weimer
2001-04-05 18:41           ` Paul Storm
2001-04-06  9:32             ` Florian Weimer
2001-04-05 18:35         ` David Starner
2001-04-06 18:10           ` Ayende Rahien
2001-04-06 22:27             ` David Starner
2001-04-08 19:03               ` Robert A Duff
2001-04-07  5:12             ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox