comp.lang.ada
 help / color / mirror / Atom feed
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: System.WCh_Cnv
Date: Tue, 25 Jul 2006 14:21:21 +0200
Date: 2006-07-25T14:21:21+02:00	[thread overview]
Message-ID: <1nbqjel4blzuj$.obwkz78gfdph$.dlg@40tude.net> (raw)
In-Reply-To: mailman.47.1153823488.30988.comp.lang.ada@ada-france.org

On Tue, 25 Jul 2006 11:31:08 +0100, Marius Amado-Alves wrote:

>>> Actually the Unicode codepoint range is 0 .. 10FFFF and therefore
>>> fits in 21 bits.
>>
>> ... the definition would allow expansion to 31-bits (but no
>> further).
> 
> The definition of some particular *encoding* namely UCS-4. Not of the  
> "character set" range. Character = codepoint. And this stops at  
> 10FFFF. And it will not be extended. IIRC both Organizations went on  
> record on this. Silly maybe, but not per se. It has to do with  
> variable length encodings. It facilitates search and verification.  
> Now these encodings may be a bit silly, yes.
> 
> I have been sketching a highly simplified, short, clear, logical,  
> understandable, usable, no nonsense, package for Unicode. I have not  
> been making much progress for several reasons. If someone wants to  
> join that would be great. The first lines of the spec follow.
> 
> -- Unico : no nonsense Unicode support for Ada
> -- (C) 2006 Marius Amado Alves
> 
> with Ada.Containers.Vectors;
> with Ada.Streams;
> 
> package Unico is
> 
>     type Character is range 0 .. 16#10FFFF#;
>     for Character'Size use 24;
> 
>     procedure Write
>       (Stream : access Ada.Streams.Root_Stream_Type'Class;
>        Item   : in Character);
> 
>     procedure Read
>       (Stream : access Ada.Streams.Root_Stream_Type'Class;
>        Item   : out Character);
[...] 

But how can you read/write it ignoring encoding?

As for Character = code point idea, I think it was a wrong from its very
start in the form of Wide_Character. The advantages of being able to index
each individual code point in a string are minor comparing with the mess it
brings with. These become almost invisible if one takes into account that
places where that might be needed, like text rendering, don't work on per
code point basis anyway. So I'm quite happy with UTF-8 and plain strings.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



  reply	other threads:[~2006-07-25 12:21 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <EBEKJMEEPPFAACCBBGNHAELNDIAA.randy@rrsoftware.com>
2006-07-25 10:31 ` System.WCh_Cnv Marius Amado-Alves
2006-07-25 12:21   ` Dmitry A. Kazakov [this message]
2006-07-25 13:03     ` System.WCh_Cnv Marius Amado-Alves
2006-07-25 13:36       ` System.WCh_Cnv Dmitry A. Kazakov
2006-07-25 14:09       ` System.WCh_Cnv Georg Bauhaus
     [not found] <8BB3B99E-16DA-4EBF-A2FE-50B079349CA9@amado-alves.info>
2006-07-25  0:45 ` System.WCh_Cnv Marius Amado-Alves
2006-07-12 14:13 System.WCh_Cnv Y.Tomino
2006-07-12 15:51 ` System.WCh_Cnv Martin Krischik
2006-07-12 18:57   ` System.WCh_Cnv Björn Persson
2006-07-13 17:24   ` System.WCh_Cnv demoonlit
2006-07-13 21:30     ` System.WCh_Cnv Björn Persson
2006-07-14  7:19       ` System.WCh_Cnv Dmitry A. Kazakov
2006-07-14  7:40       ` System.WCh_Cnv Martin Krischik
2006-07-14 12:18         ` System.WCh_Cnv Björn Persson
2006-07-16 11:41           ` System.WCh_Cnv Martin Krischik
2006-07-24 21:00             ` System.WCh_Cnv Björn Persson
2006-07-24 23:35               ` System.WCh_Cnv Randy Brukardt
2006-07-25  0:45                 ` System.WCh_Cnv Marius Amado-Alves
2006-07-14 16:13         ` System.WCh_Cnv Georg Bauhaus
2006-07-12 18:57 ` System.WCh_Cnv Björn Persson
2006-07-13 17:34   ` System.WCh_Cnv demoonlit
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox