Supporting full Unicode

comp.lang.ada
 help / color / mirror / Atom feed

* Supporting full Unicode
@ 2004-05-11 17:45 Brian Catlin
  2004-05-12  7:44 ` Ludovic Brenta
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Brian Catlin @ 2004-05-11 17:45 UTC (permalink / raw)


The complete definition of Unicode allows for 2-,3-, and 4-byte characters.  How 
is this supported in Ada95 and Ada0y?

 -Brian





^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-11 17:45 Supporting full Unicode Brian Catlin
@ 2004-05-12  7:44 ` Ludovic Brenta
  2004-05-12  8:23   ` Marius Amado Alves
  2004-05-12  9:41   ` David Starner
  2004-05-12  9:30 ` Martin Krischik
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 25+ messages in thread
From: Ludovic Brenta @ 2004-05-12  7:44 UTC (permalink / raw)



Brian Catlin asked:
> The complete definition of Unicode allows for 2-,3-, and 4-byte
> characters. How is this supported in Ada95 and Ada0y?

I am not aware of any differences between Ada 95 and Ada 2005 in that
respect.  Ada 95 has a type Wide_Character, "whose values correspond
to the 65536 code positions of the ISO 10646 Basic Multilingual Plane
(BMP)." (RM 3.5.2(3)).  So, Ada 95 supports the UCS-2 encoding
natively.  The other standard type, Character, is defined as Latin-1.

As you can see, there is no standard support for 3- and 4-byte
characters; you would have to support them in a nonstandard way, e.g.

type Wide_Wide_Character is mod 2**32; -- UCS-4
type Wide_Wide_String is array (Natural range <>) of Wide_Wide_Character;

But I would favour using UTF-8 as the internal encoding anyway.  It is
easy to define a UTF8_String type similar to the above.  GtkAda has
such a type, as GTK+ uses UTF-8 as both internal and external
encoding.

-- 
Ludovic Brenta.


-- 
Use our news server 'news.foorum.com' from anywhere.
More details at: http://nnrpinfo.go.foorum.com/



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-12  7:44 ` Ludovic Brenta
@ 2004-05-12  8:23   ` Marius Amado Alves
  2004-05-12 10:43     ` Martin Krischik
  2004-05-12 19:25     ` David Starner
  2004-05-12  9:41   ` David Starner
  1 sibling, 2 replies; 25+ messages in thread
From: Marius Amado Alves @ 2004-05-12  8:23 UTC (permalink / raw)
  To: comp.lang.ada

> But I would favour using UTF-8 as the internal encoding anyway.  It is
> easy to define a UTF8_String type similar to the above.  GtkAda has
> such a type, as GTK+ uses UTF-8 as both internal and external
> encoding.

Indeed UTF-8 seems to rule. Probably because there are more ready-to-use low
level tools for 8-bit characters. Actually the proper tools for Unicode
should be 24-bit based. An ugly fact about Unicode is that the code space is
24-bit and the encodings are all but 24 (8, 16, 32).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-11 17:45 Supporting full Unicode Brian Catlin
  2004-05-12  7:44 ` Ludovic Brenta
@ 2004-05-12  9:30 ` Martin Krischik
  2004-05-13  1:15 ` Randy Brukardt
  2004-05-14  4:00 ` Vadim Godunko
  3 siblings, 0 replies; 25+ messages in thread
From: Martin Krischik @ 2004-05-12  9:30 UTC (permalink / raw)


Brian Catlin wrote:

> The complete definition of Unicode allows for 2-,3-, and 4-byte
> characters.  How is this supported in Ada95 and Ada0y?

Build in for Ada95 is ISO-8859-1 and ISO-10646.

Currently XML/Ada has full support for Unicode. 

BTW:  is 1, 2, 4-byte characters.

With Regards

Martin

-- 
mailto://krischik@users.sourceforge.net
http://www.ada.krischik.com




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-12  7:44 ` Ludovic Brenta
  2004-05-12  8:23   ` Marius Amado Alves
@ 2004-05-12  9:41   ` David Starner
  2004-05-12 10:16     ` Björn Persson
  1 sibling, 1 reply; 25+ messages in thread
From: David Starner @ 2004-05-12  9:41 UTC (permalink / raw)

On Wed, 12 May 2004 07:44:56 +0000, Ludovic Brenta wrote:

> 
> Brian Catlin asked:
>> The complete definition of Unicode allows for 2-,3-, and 4-byte
>> characters. How is this supported in Ada95 and Ada0y?
> 
> I am not aware of any differences between Ada 95 and Ada 2005 in that
> respect.  Ada 95 has a type Wide_Character, "whose values correspond
> to the 65536 code positions of the ISO 10646 Basic Multilingual Plane
> (BMP)." (RM 3.5.2(3)).  

Ada 2005 is apparently going to have a Wide_Wide_Character for full
Unicode. That's unfortunate; they should have defined Wide_Character
to be UTF-16 like Java did. Besides taking less space and hence
often being faster, this move leaves a Wide_Character that should never
be used but will anyway. (No, I don't care if you don't use those
characters; one of the people using your program will be an archaeologist
who wants to transfer his Unicode data from another program to yours,
and will be rather annoyed that your "Unicode" program doesn't support
perfectly valid Unicode data.)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-12  9:41   ` David Starner
@ 2004-05-12 10:16     ` Björn Persson
  2004-05-12 10:57       ` Ludovic Brenta
  0 siblings, 1 reply; 25+ messages in thread
From: Björn Persson @ 2004-05-12 10:16 UTC (permalink / raw)


David Starner wrote:

> they should have defined Wide_Character to be UTF-16 like Java did.

Keeping in mind that in UTF-16 some characters take two bytes and others 
take four, how do you propose to define that type?

-- 
Björn Persson

jor ers @sv ge.
b n_p son eri nu




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-12  8:23   ` Marius Amado Alves
@ 2004-05-12 10:43     ` Martin Krischik
  2004-05-12 14:56       ` Björn Persson
  2004-05-12 19:09       ` David Starner
  2004-05-12 19:25     ` David Starner
  1 sibling, 2 replies; 25+ messages in thread
From: Martin Krischik @ 2004-05-12 10:43 UTC (permalink / raw)

Marius  wrote:

>> But I would favour using UTF-8 as the internal encoding anyway.  It is
>> easy to define a UTF8_String type similar to the above.  GtkAda has
>> such a type, as GTK+ uses UTF-8 as both internal and external
>> encoding.

> Indeed UTF-8 seems to rule. Probably because there are more ready-to-use
> low level tools for 8-bit characters. Actually the proper tools for
> Unicode should be 24-bit based. An ugly fact about Unicode is that the
> code space is 24-bit and the encodings are all but 24 (8, 16, 32).

Not quite right. The current code space is 32 bit of which only 24 bits are
used. That of corse means that in UTF-8 a max of 4 character are used.

However, this may change when the extrateristials arrive ;-). Any program
with only 24 bit will break then.

Won't happen. Well up until recently only 16 bit where used and programmers
freely mixed UTF-16 and UCS-16. But then the archaeologist came.

Of corse currently we repeat that mistake: UTF-32 is variable length as well
and should not be mixed with UCS-32.

With regards

Martin

-- 
mailto://krischik@users.sourceforge.net
http://www.ada.krischik.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-12 10:16     ` Björn Persson
@ 2004-05-12 10:57       ` Ludovic Brenta
  2004-05-12 14:53         ` Björn Persson
  0 siblings, 1 reply; 25+ messages in thread
From: Ludovic Brenta @ 2004-05-12 10:57 UTC (permalink / raw)

Bjorn Persson wrote:
> David Starner wrote:
>> they should have defined Wide_Character to be UTF-16 like Java did.
> 
> Keeping in mind that in UTF-16 some characters take two bytes and
> others take four, how do you propose to define that type?

It is true that variable-width encodings such as UTF-16 or UTF-8 are
more difficult to handle than fixed-width encodings like UCS-2 or
UCS-4.  Basically, if you want to do advanced processing of character
data, you may find it easier to first transcode it to UCS-4
(i.e. Wide_Wide_Character, 32 bits wide).

But UTF-8 is gaining momemtum.  Originally intended as an external
encoding only, it is now in use as an internal encoding, too.  I
suppose that it turned out that processing UTF-8 directly is not that
difficult after all.  This is especially true if all you want to do is
localisation of software using gettext; in this case, you can use
UTF-8 as both your internal and external encoding without any trouble.

The Perl regular expression engine, for example, supports UTF-8
strings directly.  I don't know if it transcodes to UTF-4 internally.

-- 
Ludovic Brenta.

-- 
Use our news server 'news.foorum.com' from anywhere.
More details at: http://nnrpinfo.go.foorum.com/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Supporting full Unicode
@ 2004-05-12 12:40 amado.alves
  2004-05-12 14:34 ` Martin Krischik
  0 siblings, 1 reply; 25+ messages in thread
From: amado.alves @ 2004-05-12 12:40 UTC (permalink / raw)
  To: comp.lang.ada

>> An ugly fact about Unicode is that the code space is 24-bit...

> Not quite right. The current code space is 32 bit of which only 24 bits are used...

You made me go for the book to see if this had changed. It has not. The codespace is 24 bit. Actually it's 21.

"In the Unicode Standard, the codespace consists of the integers from 0 to 10FFFF..."

(The Unicode Standard, Version 4.0.1, section 2.4)




^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Supporting full Unicode
@ 2004-05-12 14:12 amado.alves
  0 siblings, 0 replies; 25+ messages in thread
From: amado.alves @ 2004-05-12 14:12 UTC (permalink / raw)
  To: comp.lang.ada

"The codespace is 24 bit. Actually it's 21."

Actually it's 20.08746284 :-)
 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Supporting full Unicode
  2004-05-12 12:40 amado.alves
@ 2004-05-12 14:34 ` Martin Krischik
  2004-05-12 18:24   ` David Starner
  2004-05-12 20:04   ` Florian Weimer
  0 siblings, 2 replies; 25+ messages in thread
From: Martin Krischik @ 2004-05-12 14:34 UTC (permalink / raw)


amado.alves wrote:

>>> An ugly fact about Unicode is that the code space is 24-bit...
> 
>> Not quite right. The current code space is 32 bit of which only 24 bits
>> are used...
> 
> You made me go for the book to see if this had changed. It has not. The
> codespace is 24 bit. Actually it's 21.
> 
> "In the Unicode Standard, the codespace consists of the integers from 0 to
> 10FFFF..."
> 
> (The Unicode Standard, Version 4.0.1, section 2.4)

If you mean the currently used code space then yes. But they extend the
codepace from time to time

With Regards

Martin

-- 
mailto://krischik@users.sourceforge.net
http://www.ada.krischik.com




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-12 10:57       ` Ludovic Brenta
@ 2004-05-12 14:53         ` Björn Persson
  2004-05-12 18:55           ` David Starner
  0 siblings, 1 reply; 25+ messages in thread
From: Björn Persson @ 2004-05-12 14:53 UTC (permalink / raw)

Ludovic Brenta wrote:

> Bjorn Persson wrote:
> 
>>David Starner wrote:
>>
>>>they should have defined Wide_Character to be UTF-16 like Java did.
>>
>>Keeping in mind that in UTF-16 some characters take two bytes and
>>others take four, how do you propose to define that type?

[...]

> But UTF-8 is gaining momemtum.  Originally intended as an external
> encoding only, it is now in use as an internal encoding, too.

[...]

I'm not trying to stop anyone from using whatever encoding they like 
internally. Just make sure you always know which encoding you have.

I just asked how David wanted to define a UTF-16 *character* type. A 
UTF-16 *string* can be represented as an array of 16-bit elements, but 
those elements aren't characters. In some cases an element is just half 
a character. Only the fixed-width encodings can be easily represented as 
arrays of characters.

Here's a datatype that can represent all UTF-16 characters and doesn't 
accept illegal characters:

type Subrange is (One_Byte_Below_Hole,
                   One_Byte_Above_Hole,
                   Two_Byte);
type Code_Point_Below_Hole is range 0 .. 16#D7FF#;
type High_Surrogate_Code_Point is range 16#D800# .. 16#DBFF#;
type Low_Surrogate_Code_Point is range 16#DC00# .. 16#DFFF#;
type Code_Point_Above_Hole is range 16#E000# .. 16#FFFD#;
type UTF_16_Character(Block : Subrange) is record
    case Block is
       when One_Byte_Below_Hole =>
          Value_Below_Hole : Code_Point_Below_Hole;
       when One_Byte_Above_Hole =>
          Value_Above_Hole : Code_Point_Above_Hole;
       when Two_Byte =>
          High_Surrogate_Value : High_Surrogate_Code_Point;
          Low_Surrogate_Value : Low_Surrogate_Code_Point;
    end case;
end record;

Looks troublesome, eh? For UTF-8 I don't think it's even possible to 
define such a type. I'd rather just define UTF-16 and UTF-8 strings as 
byte sequences and represent even single characters as strings.

So go ahead and use UTF-8 in your programs, or Shift-JIS or EBCDIC for 
all I care, but think twice before you define datatypes for 
variable-width *characters*.

-- 
Björn Persson

jor ers @sv ge.
b n_p son eri nu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-12 10:43     ` Martin Krischik
@ 2004-05-12 14:56       ` Björn Persson
  2004-05-12 19:09       ` David Starner
  1 sibling, 0 replies; 25+ messages in thread
From: Björn Persson @ 2004-05-12 14:56 UTC (permalink / raw)


Martin Krischik wrote:

> UTF-32 is variable length as well and should not be mixed with UCS-32.

Can you show me a document that describes how to encode characters 
greater than 4294967295 in UTF-32?

-- 
Björn Persson

jor ers @sv ge.
b n_p son eri nu




^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Supporting full Unicode
  2004-05-12 14:34 ` Martin Krischik
@ 2004-05-12 18:24   ` David Starner
  2004-05-12 20:04   ` Florian Weimer
  1 sibling, 0 replies; 25+ messages in thread
From: David Starner @ 2004-05-12 18:24 UTC (permalink / raw)

On Wed, 12 May 2004 16:34:34 +0200, Martin Krischik wrote:

> amado.alves wrote:
> 
>> 
>> You made me go for the book to see if this had changed. It has not. The
>> codespace is 24 bit. Actually it's 21.
>> 
>> "In the Unicode Standard, the codespace consists of the integers from 0 to
>> 10FFFF..."
>> 
>> (The Unicode Standard, Version 4.0.1, section 2.4)
> 
> If you mean the currently used code space then yes. But they extend the
> codepace from time to time

No, they don't. They extended it once, in the Unicode/ISO-10646 merge in
1996, as well as moving characters around. There is no plan to do either
ever again.

There's no reason to extend the code space, either. There's over a million
code points, and in a decade they've managed to fill about a hundred
thousand of them. The Unicode roadmaps <http://www.unicode.org/roadmaps/>
have space blocked out for every character they think they might want to
encode, from Rongorong to Egyptian Hieroglyphics. All of it fits in
three planes of 65,536 characters with plenty of room to spare on one of
them. It's though that Chinese character might exceed their current plane,
and need a new one. Even with two planes for Chinese, there's still only
four planes in use, plus two planes permanently dedicated to Private Use
characters, leaving 650,000+ characters with no conceivable use. Short of
extraterrestrials, there's no point in extending the code space.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-12 14:53         ` Björn Persson
@ 2004-05-12 18:55           ` David Starner
  0 siblings, 0 replies; 25+ messages in thread
From: David Starner @ 2004-05-12 18:55 UTC (permalink / raw)

On Wed, 12 May 2004 14:53:23 +0000, Bjï¿½rn Persson wrote:

> Looks troublesome, eh? For UTF-8 I don't think it's even possible to 
> define such a type. I'd rather just define UTF-16 and UTF-8 strings as 
> byte sequences and represent even single characters as strings.

Why would encode UTF-16 as a byte sequences when they could encode it as
a series of words? You can't use UTF-16 internally as byte sequences
without worry about byte-order marks, because UTF-16 is constructively
ambigious as to whether it's big-endian or little-endian. Anything you
defined would either be UTF-16BE or UTF-16LE, and spend a lot of time
reassembling character pieces on the wrong endian architecture. UTF-16
should usually be encoded as words.

As for characters, they're not much use with Unicode. Even with Latin-1,
you can't uppercase a character to character, and any system that does
it is wrong, include Ada. The German esszett (ï¿½) uppercases to SS. You
can't even hold a whole "character" in a character in Unicode, because
you can't fit any attached combining characters in. 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-12 10:43     ` Martin Krischik
  2004-05-12 14:56       ` Björn Persson
@ 2004-05-12 19:09       ` David Starner
  1 sibling, 0 replies; 25+ messages in thread
From: David Starner @ 2004-05-12 19:09 UTC (permalink / raw)

On Wed, 12 May 2004 12:43:22 +0200, Martin Krischik wrote:

> Won't happen. Well up until recently only 16 bit where used and programmers
> freely mixed UTF-16 and UCS-16. But then the archaeologist came.

Until 1996. It's been clear since at least then that Chinese would need
more than that--if you threw Chinese ideographs out of the standard, the
rest would fit in 16 bits, and if you threw everything else out of the
standard, it still wouldn't fit in 16 bits. The standard in 1996 told
programmers that they would need to implement a 21 bit code space, and
told them how to do it; the only thing that waited until recently was
actually adding characters there.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-12  8:23   ` Marius Amado Alves
  2004-05-12 10:43     ` Martin Krischik
@ 2004-05-12 19:25     ` David Starner
  1 sibling, 0 replies; 25+ messages in thread
From: David Starner @ 2004-05-12 19:25 UTC (permalink / raw)


> Indeed UTF-8 seems to rule. Probably because there are more ready-to-use low
> level tools for 8-bit characters. Actually the proper tools for Unicode
> should be 24-bit based. An ugly fact about Unicode is that the code space is
> 24-bit and the encodings are all but 24 (8, 16, 32).

Why is that ugly? UTF-16 or UTF-8 is virtually always going to be smaller,
unless most of your text is in an obscure dead tongue, which is unlikely
to found in quantities that need compression. It's not going to be faster
to process, unless you're running on some terribly obscure architecture
that natively handles 24 bit words.

As someone else pointed out, it's not 24, it's roughly 20.1. 

As for compression, a comparison of compression formats on various Unicode
encodings was made[1], and it was found that most of the difference
between encodings was wiped out by compression.

[1] http://www.cs.fit.edu/~ryan/compress/



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-12 14:34 ` Martin Krischik
  2004-05-12 18:24   ` David Starner
@ 2004-05-12 20:04   ` Florian Weimer
  1 sibling, 0 replies; 25+ messages in thread
From: Florian Weimer @ 2004-05-12 20:04 UTC (permalink / raw)


* Martin Krischik:

> If you mean the currently used code space then yes. But they extend the
> codepace from time to time

There's a promise from both the Unicode Consortium and the WG behind
ISO 10646-1 that they won't assign characters outside the first
seventeen planes.

-- 
Current mail filters: many dial-up/DSL/cable modem hosts, and the
following domains: atlas.cz, bigpond.com, di-ve.com, hotmail.com,
jumpy.it, libero.it, netscape.net, postino.it, simplesnet.pt,
tiscali.co.uk, tiscali.cz, tiscali.it, voila.fr, yahoo.com.



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-11 17:45 Supporting full Unicode Brian Catlin
  2004-05-12  7:44 ` Ludovic Brenta
  2004-05-12  9:30 ` Martin Krischik
@ 2004-05-13  1:15 ` Randy Brukardt
  2004-05-13 17:58   ` Brian Catlin
  2004-05-14  4:00 ` Vadim Godunko
  3 siblings, 1 reply; 25+ messages in thread
From: Randy Brukardt @ 2004-05-13  1:15 UTC (permalink / raw)


"Brian Catlin" <BrianC@sannas.org.bad> wrote in message
news:9j8oc.16324$V97.13312@newsread1.news.pas.earthlink.net...
> The complete definition of Unicode allows for 2-,3-, and 4-byte
characters.  How
> is this supported in Ada95 and Ada0y?

It hasn't completely been decided yet. AI-285 is the AI for that. There are
some issues that have to be taken up at the WG9 level (the ARG cannot make a
decision because some countries oppose the only practical solutions).

It doesn't look like there is going to be any real support for UCS-8 or
UCS-16, though.

               Randy.






^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-13  1:15 ` Randy Brukardt
@ 2004-05-13 17:58   ` Brian Catlin
  2004-05-13 19:42     ` Randy Brukardt
  0 siblings, 1 reply; 25+ messages in thread
From: Brian Catlin @ 2004-05-13 17:58 UTC (permalink / raw)


"Randy Brukardt" <randy@rrsoftware.com> wrote in message 
news:mcidnS40INqCUT_dRVn-sw@megapath.net...
> "Brian Catlin" <BrianC@sannas.org.bad> wrote in message
> news:9j8oc.16324$V97.13312@newsread1.news.pas.earthlink.net...
>> The complete definition of Unicode allows for 2-,3-, and 4-byte
> characters.  How
>> is this supported in Ada95 and Ada0y?
>
> It hasn't completely been decided yet. AI-285 is the AI for that. There are
> some issues that have to be taken up at the WG9 level (the ARG cannot make a
> decision because some countries oppose the only practical solutions).

Why would anyone oppose it?

 -Brian

> It doesn't look like there is going to be any real support for UCS-8 or
> UCS-16, though.
>
>               Randy.
>
>
> 





^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-13 17:58   ` Brian Catlin
@ 2004-05-13 19:42     ` Randy Brukardt
  2004-05-14  8:40       ` Andersen Jacob Sparre
  0 siblings, 1 reply; 25+ messages in thread
From: Randy Brukardt @ 2004-05-13 19:42 UTC (permalink / raw)


"Brian Catlin" <BrianC@sannas.org.bad> wrote in message
news:oHOoc.18832$Hs1.18092@newsread2.news.pas.earthlink.net...
> "Randy Brukardt" <randy@rrsoftware.com> wrote in message
> news:mcidnS40INqCUT_dRVn-sw@megapath.net...
> > "Brian Catlin" <BrianC@sannas.org.bad> wrote in message
> > news:9j8oc.16324$V97.13312@newsread1.news.pas.earthlink.net...
> >> The complete definition of Unicode allows for 2-,3-, and 4-byte
> > characters.  How
> >> is this supported in Ada95 and Ada0y?
> >
> > It hasn't completely been decided yet. AI-285 is the AI for that. There
are
> > some issues that have to be taken up at the WG9 level (the ARG cannot
make a
> > decision because some countries oppose the only practical solutions).
>
> Why would anyone oppose it?

Because Unicode is not an International Standard, and the needed
transformation tables are not defined by ISO-10646.

You might say that this is silly (I would!), but these sorts of issues have
been debated for years in the character set groups -- and we're just
collecting some of the fall-out. There probably are carefully-crafted
compromise positions that prevent us from doing what we need to do.

                       Randy.


                         Randy.






^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-11 17:45 Supporting full Unicode Brian Catlin
                   ` (2 preceding siblings ...)
  2004-05-13  1:15 ` Randy Brukardt
@ 2004-05-14  4:00 ` Vadim Godunko
  2004-05-14 17:51   ` Brian Catlin
  3 siblings, 1 reply; 25+ messages in thread
From: Vadim Godunko @ 2004-05-14  4:00 UTC (permalink / raw)


"Brian Catlin" <BrianC@sannas.org.bad> wrote in message news:<9j8oc.16324$V97.13312@newsread1.news.pas.earthlink.net>...
> The complete definition of Unicode allows for 2-,3-, and 4-byte characters.  How 
> is this supported in Ada95 and Ada0y?
> 
Please remember, the Unicode not only definition of different
character encoding, but also (and this is more impotant) definition of
a lot of character properties and handling algorithms.

Some time ago we try to write Ada Unicode library and you may find
prototype at Ada-Russian site.

http://www.ada-ru.org/files/ais-0.0.1.tar.gz



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-13 19:42     ` Randy Brukardt
@ 2004-05-14  8:40       ` Andersen Jacob Sparre
  2004-05-14 20:20         ` Randy Brukardt
  0 siblings, 1 reply; 25+ messages in thread
From: Andersen Jacob Sparre @ 2004-05-14  8:40 UTC (permalink / raw)


Randy Brukardt wrote:

> Because Unicode is not an International Standard, and the needed
> transformation tables are not defined by ISO-10646.

And there is no way to get them into ISO-10646?

I think I can see the point in just letting the Ada standard depend on
other international standards.

Jacob
-- 
"Any, sufficiently advanced, technology is indistinguishable from magic."



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-14  4:00 ` Vadim Godunko
@ 2004-05-14 17:51   ` Brian Catlin
  0 siblings, 0 replies; 25+ messages in thread
From: Brian Catlin @ 2004-05-14 17:51 UTC (permalink / raw)


"Vadim Godunko" <vgodunko@vipmail.ru> wrote in message 
news:665e587a.0405132000.43e7b2f8@posting.google.com...
> "Brian Catlin" <BrianC@sannas.org.bad> wrote in message 
> news:<9j8oc.16324$V97.13312@newsread1.news.pas.earthlink.net>...
>> The complete definition of Unicode allows for 2-,3-, and 4-byte characters. 
>> How
>> is this supported in Ada95 and Ada0y?
>>
> Please remember, the Unicode not only definition of different
> character encoding, but also (and this is more impotant) definition of
> a lot of character properties and handling algorithms.
>
> Some time ago we try to write Ada Unicode library and you may find
> prototype at Ada-Russian site.
>
> http://www.ada-ru.org/files/ais-0.0.1.tar.gz

Great!

Thanks
 -Brian 





^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Supporting full Unicode
  2004-05-14  8:40       ` Andersen Jacob Sparre
@ 2004-05-14 20:20         ` Randy Brukardt
  0 siblings, 0 replies; 25+ messages in thread
From: Randy Brukardt @ 2004-05-14 20:20 UTC (permalink / raw)


"Andersen Jacob Sparre" <sparre@jacob.crs4.it> wrote in message
news:rlslljvmmes.fsf@jacob.crs4.it...
> Randy Brukardt wrote:
>
> > Because Unicode is not an International Standard, and the needed
> > transformation tables are not defined by ISO-10646.
>
> And there is no way to get them into ISO-10646?

I presume that there are political fights that have blocked that. But I
can't say much about the inner workings of other ISO groups. The CO group is
struggling with these same issues; we're not alone here.

                 Randy.






^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2004-05-14 20:20 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-05-11 17:45 Supporting full Unicode Brian Catlin
2004-05-12  7:44 ` Ludovic Brenta
2004-05-12  8:23   ` Marius Amado Alves
2004-05-12 10:43     ` Martin Krischik
2004-05-12 14:56       ` Björn Persson
2004-05-12 19:09       ` David Starner
2004-05-12 19:25     ` David Starner
2004-05-12  9:41   ` David Starner
2004-05-12 10:16     ` Björn Persson
2004-05-12 10:57       ` Ludovic Brenta
2004-05-12 14:53         ` Björn Persson
2004-05-12 18:55           ` David Starner
2004-05-12  9:30 ` Martin Krischik
2004-05-13  1:15 ` Randy Brukardt
2004-05-13 17:58   ` Brian Catlin
2004-05-13 19:42     ` Randy Brukardt
2004-05-14  8:40       ` Andersen Jacob Sparre
2004-05-14 20:20         ` Randy Brukardt
2004-05-14  4:00 ` Vadim Godunko
2004-05-14 17:51   ` Brian Catlin
  -- strict thread matches above, loose matches on Subject: below --
2004-05-12 12:40 amado.alves
2004-05-12 14:34 ` Martin Krischik
2004-05-12 18:24   ` David Starner
2004-05-12 20:04   ` Florian Weimer
2004-05-12 14:12 amado.alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox