comp.lang.ada
 help / color / mirror / Atom feed
* Ada + Multi-Byte/Wide Chars = Modern Language?
@ 1995-01-23 18:32 Richard L. Goerwitz
  1995-01-24 19:28 ` Robert Dewar
  1995-01-26  3:36 ` R. William Beckwith
  0 siblings, 2 replies; 10+ messages in thread
From: Richard L. Goerwitz @ 1995-01-23 18:32 UTC (permalink / raw)


Ada95 looks to be a useful language.

One question, though:  Does it fully support wide/multi-byte char-
acters?  For example, am I free to write apps that utilize UTF-8
or Unicode?  Will I be able to interface these programs with the
outside world (e.g., X)?  Does GNAT support these features?

Any advice would be appreciated.

-- 

   Richard L. Goerwitz     ***      goer@midway.uchicago.edu



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ada + Multi-Byte/Wide Chars = Modern Language?
  1995-01-23 18:32 Ada + Multi-Byte/Wide Chars = Modern Language? Richard L. Goerwitz
@ 1995-01-24 19:28 ` Robert Dewar
  1995-01-26 12:56   ` Gentle
       [not found]   ` <1995Jan27.040708.22494@midway.uchicago.edu>
  1995-01-26  3:36 ` R. William Beckwith
  1 sibling, 2 replies; 10+ messages in thread
From: Robert Dewar @ 1995-01-24 19:28 UTC (permalink / raw)


Ada 95 fully supports ISO 10646 = Unicode (thanks to the programming deity
in the sky for this welcome unification :-)

GNAT supports much of this, providing several different methods for
encoding wide characters, including JIS, shift-JIS, EUC, and a general
method allowing arbitrary encoding. We do not yet have Wide_Text_IO
implemented.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ada + Multi-Byte/Wide Chars = Modern Language?
  1995-01-23 18:32 Ada + Multi-Byte/Wide Chars = Modern Language? Richard L. Goerwitz
  1995-01-24 19:28 ` Robert Dewar
@ 1995-01-26  3:36 ` R. William Beckwith
  1 sibling, 0 replies; 10+ messages in thread
From: R. William Beckwith @ 1995-01-26  3:36 UTC (permalink / raw)


Richard L. Goerwitz (goer@quads.uchicago.edu) wrote:
: Ada95 looks to be a useful language.

It is.

: One question, though:  Does it fully support wide/multi-byte char-
: acters?  For example, am I free to write apps that utilize UTF-8
: or Unicode?  Will I be able to interface these programs with the
: outside world (e.g., X)?  Does GNAT support these features?

There is a multi-byte character type built into Ada95 called
Wide_Character.

This Wide_Character type maps nicely to Xlib's wchar_t type
(we did this in our Ada95 X11R6 multi-threaded Xlib binding). 

wchar_t is used in Xlib's XIMText struct and the Xwc... functions.

... Bill



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ada + Multi-Byte/Wide Chars = Modern Language?
  1995-01-24 19:28 ` Robert Dewar
@ 1995-01-26 12:56   ` Gentle
  1995-01-28  1:56     ` R. William Beckwith
       [not found]   ` <1995Jan27.040708.22494@midway.uchicago.edu>
  1 sibling, 1 reply; 10+ messages in thread
From: Gentle @ 1995-01-26 12:56 UTC (permalink / raw)


On 24 Jan 1995 14:28:46 -0500, Robert Dewar (dewar@cs.nyu.edu) wrote:
: GNAT supports much of this, providing several different methods for
: encoding wide characters, including JIS, shift-JIS, EUC, and a general
: method allowing arbitrary encoding. We do not yet have Wide_Text_IO
: implemented.

  This may be a silly question, but what exactly is a "wide character"?

--
=========================================================================
gentle@cnj.digex.net  -  Finger for PGP Public Key
Software Engineer
Edison, NJ

Twink code:
T6 C1 L2w h-(:) d-- av w- e+ g+ f t++(2,4,6,7,8) k++v s- m1 m2 q-

A chubby man with a white beard and a red suit will approach you soon.
Avoid him.  He's a Commie.
=========================================================================



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ada + Multi-Byte/Wide Chars = Modern Language?
  1995-01-26 12:56   ` Gentle
@ 1995-01-28  1:56     ` R. William Beckwith
  1995-01-29 17:17       ` Richard L. Goerwitz
  0 siblings, 1 reply; 10+ messages in thread
From: R. William Beckwith @ 1995-01-28  1:56 UTC (permalink / raw)


Gentle (gentle@cnj.digex.net) wrote:

:   This may be a silly question, but what exactly is a "wide character"?

A two-byte character type built into the Ada95 language.  LRM, Standard pkg:

    -- The declaration of type Wide_Character is based on the standard
    -- ISO 10646 BMP character set.  The first 256 positions have the
    -- same contents as type Character.  See 3.5.2.

    type Wide_Character is (nul, soh ... FFFE, FFFF);

    ...

    type Wide_String is array(Positive range <>) of Wide_Character;
    pragma Pack(Wide_String);

... Bill



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ada + Multi-Byte/Wide Chars = Modern Language?
       [not found]   ` <1995Jan27.040708.22494@midway.uchicago.edu>
@ 1995-01-28 18:30     ` Robert Dewar
  0 siblings, 0 replies; 10+ messages in thread
From: Robert Dewar @ 1995-01-28 18:30 UTC (permalink / raw)



"Shouldn't this sort of thing [providing access to Unicode] be done by the
 underlying C compiler"

What underlying C compiler, there is no C compiler "underlying" GNAT, and
as has been emphasized in the past, GNAT is in no sense an Ada to C
translator. GNAT does provide full access to unicode, via the wide
character type, including the provision of several source encoding
schemes etc.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ada + Multi-Byte/Wide Chars = Modern Language?
  1995-01-28  1:56     ` R. William Beckwith
@ 1995-01-29 17:17       ` Richard L. Goerwitz
  1995-01-30 17:27         ` Vincent Broman
                           ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Richard L. Goerwitz @ 1995-01-29 17:17 UTC (permalink / raw)


In R. William Beckwith writes:
>
>    -- The declaration of type Wide_Character is based on the standard
>    -- ISO 10646 BMP character set.  The first 256 positions have the
>    -- same contents as type Character.  See 3.5.2.
        ^^^^^^^^^^^^^

What is meant by "same contents"?  There are several multi-byte and wide
character standards.  By using the word "contents" here is the standard
implying that type Character is assumed to encode specific glyphs?

-- 

   Richard L. Goerwitz     ***      goer@midway.uchicago.edu



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ada + Multi-Byte/Wide Chars = Modern Language?
  1995-01-29 17:17       ` Richard L. Goerwitz
@ 1995-01-30 17:27         ` Vincent Broman
  1995-02-01 12:13         ` Robert Dewar
  1995-02-02  2:53         ` Tucker Taft
  2 siblings, 0 replies; 10+ messages in thread
From: Vincent Broman @ 1995-01-30 17:27 UTC (permalink / raw)


goer@midway.uchicago.edu asked about RM95 A.1 on wide_character
> What is meant by "same contents"?

Ada95 predefines Character to be ISO 8859-1 (Latin-1) and
predefines Wide_Character to be ISO 10646 (Unicode).
The first 256 codes in Unicode have the same meaning as in Latin-1.
The literals for the first 256 elements of the type Wide_Character
are the same as for the type Character.

Other character types than these two predefined types can be
defined by the user.

Vincent Broman,  code 572 Bayside                        Email: broman@nosc.mil
Naval Command Control and Ocean Surveillance Center, RDT&E Div.
San Diego, CA  92152-6147,  USA                          Phone: +1 619 553 1641



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ada + Multi-Byte/Wide Chars = Modern Language?
  1995-01-29 17:17       ` Richard L. Goerwitz
  1995-01-30 17:27         ` Vincent Broman
@ 1995-02-01 12:13         ` Robert Dewar
  1995-02-02  2:53         ` Tucker Taft
  2 siblings, 0 replies; 10+ messages in thread
From: Robert Dewar @ 1995-02-01 12:13 UTC (permalink / raw)


Richard Goerwitz says:

>In R. William Beckwith writes:
>>
>>    -- The declaration of type Wide_Character is based on the standard
>>    -- ISO 10646 BMP character set.  The first 256 positions have the
>>    -- same contents as type Character.  See 3.5.2.
>        ^^^^^^^^^^^^^
>
>What is meant by "same contents"?  There are several multi-byte and wide
>character standards.  By using the word "contents" here is the standard
>implying that type Character is assumed to encode specific glyphs?

Yes, there are several multi-byte and wide character standards, but what's
that got to do with it? As Bill (and the RM!) make clear, Ada uses the
ISO 10646 BMP standard, which is identical to Unicode. The first 256
character positions of the BMP set correspond to Latin-1, which is what
type Character in Ada is. i.e the ansswer to your question is that not
only is the Standard "implying" the encoding of specific glyphs, it is
in a sense requiring it.

Of course in practice there are no language semantics that depend on the
specific glyphs, so a given Ada compiler can be used in an environment
with a quite different set of glyphs.

In addition, Ada compilers are free to provide alternative localizations
of the definition of Character and/or Wide_Character. For example, in
GNAT there is a switch gnati with the following settings:

  -gnati1   Latin-1 (the standard setting)
  -gnati2   Latin-2
  -gnati3   Latin-3
  -gnati4   Latin-4
  -gnatip   IBM-PC character set
  -gnatif   Full upper half allowed in identifiers with no case equivalence
  -gnatin   No upper half characters allowed in identifiers
  -gnatiw   Wide characters allowed in identifiers

These settings govern the set of characters that are accepted in identifiers,
and the definition of upper-lower case correspondence. For detailed definitions
of these character sets, see the source of package Csets in the GNAT sources.





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ada + Multi-Byte/Wide Chars = Modern Language?
  1995-01-29 17:17       ` Richard L. Goerwitz
  1995-01-30 17:27         ` Vincent Broman
  1995-02-01 12:13         ` Robert Dewar
@ 1995-02-02  2:53         ` Tucker Taft
  2 siblings, 0 replies; 10+ messages in thread
From: Tucker Taft @ 1995-02-02  2:53 UTC (permalink / raw)


In article <1995Jan29.171712.4531@midway.uchicago.edu>,
Richard L. Goerwitz <goer@midway.uchicago.edu> wrote:

>In R. William Beckwith writes:
>>
>>    -- The declaration of type Wide_Character is based on the standard
>>    -- ISO 10646 BMP character set.  The first 256 positions have the
>>    -- same contents as type Character.  See 3.5.2.
>        ^^^^^^^^^^^^^
>
>What is meant by "same contents"?  There are several multi-byte and wide
>character standards.  By using the word "contents" here is the standard
>implying that type Character is assumed to encode specific glyphs?

The standard doesn't use the term "contents" -- that was Bill's 
paraphrasing.

In any case, type Character is (in the "standard mode") interpreted
as the characters of ISO 8859-1 (aka Latin-1), or equivalently
Row 00 of ISO 10646 BMP (Basic Multilingual Plane).
Type Wide_Character is (in the standard mode) interpreted as the characters
of ISO 10646 BMP.

Compilers are allowed to support localization of Character to better
support the local character set, though Ada has always supported
multiple character sets very well, since character/string literals can be
of any "character"/"string" type, either Standard.Character/String or 
a user-defined character/string type.

>   Richard L. Goerwitz     ***      goer@midway.uchicago.edu

-Tucker Taft  stt@inmet.com
Intermetrics, Inc.



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~1995-02-02  2:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1995-01-23 18:32 Ada + Multi-Byte/Wide Chars = Modern Language? Richard L. Goerwitz
1995-01-24 19:28 ` Robert Dewar
1995-01-26 12:56   ` Gentle
1995-01-28  1:56     ` R. William Beckwith
1995-01-29 17:17       ` Richard L. Goerwitz
1995-01-30 17:27         ` Vincent Broman
1995-02-01 12:13         ` Robert Dewar
1995-02-02  2:53         ` Tucker Taft
     [not found]   ` <1995Jan27.040708.22494@midway.uchicago.edu>
1995-01-28 18:30     ` Robert Dewar
1995-01-26  3:36 ` R. William Beckwith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox