From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,43ab55a75a8b5d1
X-Google-Attributes: gid103376,public
X-Google-Language: ENGLISH,ASCII-7-bit
Path: 
 g2news2.google.com!news4.google.com!news.glorb.com!news-out1.kabelfoon.nl!newsfeed.kabelfoon.nl!bandi.nntp.kabelfoon.nl!newsfeed.freenet.de!news.netcologne.de!newsfeed-fusi2.netcologne.de!newsfeed01.chello.at!newsfeed.arcor.de!news.arcor.de!not-for-mail
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: System.WCh_Cnv
Newsgroups: comp.lang.ada
User-Agent: 40tude_Dialog/2.0.15.1
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Reply-To: mailbox@dmitry-kazakov.de
Organization: cbb software GmbH
References: <EBEKJMEEPPFAACCBBGNHAELNDIAA.randy@rrsoftware.com>
 <mailman.47.1153823488.30988.comp.lang.ada@ada-france.org>
Date: Tue, 25 Jul 2006 14:21:21 +0200
Message-ID: <1nbqjel4blzuj$.obwkz78gfdph$.dlg@40tude.net>
NNTP-Posting-Date: 25 Jul 2006 14:21:21 MEST
NNTP-Posting-Host: 75b26032.newsread4.arcor-online.net
X-Trace: 
 DXC=D3m=ii\?KZLRG:gi]CKmZG:ejgIfPPldDjW\KbG]kaMHea\9g\;7NmEI[l:ilTIH?I[6LHn;2LCVN7enW;^6ZC`DIXm65S@:3>O
X-Complaints-To: usenet-abuse@arcor.de
Xref: g2news2.google.com comp.lang.ada:5916
Date: 2006-07-25T14:21:21+02:00
List-Id: <comp.lang.ada>

On Tue, 25 Jul 2006 11:31:08 +0100, Marius Amado-Alves wrote:

>>> Actually the Unicode codepoint range is 0 .. 10FFFF and therefore
>>> fits in 21 bits.
>>
>> ... the definition would allow expansion to 31-bits (but no
>> further).
> 
> The definition of some particular *encoding* namely UCS-4. Not of the  
> "character set" range. Character = codepoint. And this stops at  
> 10FFFF. And it will not be extended. IIRC both Organizations went on  
> record on this. Silly maybe, but not per se. It has to do with  
> variable length encodings. It facilitates search and verification.  
> Now these encodings may be a bit silly, yes.
> 
> I have been sketching a highly simplified, short, clear, logical,  
> understandable, usable, no nonsense, package for Unicode. I have not  
> been making much progress for several reasons. If someone wants to  
> join that would be great. The first lines of the spec follow.
> 
> -- Unico : no nonsense Unicode support for Ada
> -- (C) 2006 Marius Amado Alves
> 
> with Ada.Containers.Vectors;
> with Ada.Streams;
> 
> package Unico is
> 
>     type Character is range 0 .. 16#10FFFF#;
>     for Character'Size use 24;
> 
>     procedure Write
>       (Stream : access Ada.Streams.Root_Stream_Type'Class;
>        Item   : in Character);
> 
>     procedure Read
>       (Stream : access Ada.Streams.Root_Stream_Type'Class;
>        Item   : out Character);
[...] 

But how can you read/write it ignoring encoding?

As for Character = code point idea, I think it was a wrong from its very
start in the form of Wide_Character. The advantages of being able to index
each individual code point in a string are minor comparing with the mess it
brings with. These become almost invisible if one takes into account that
places where that might be needed, like text rendering, don't work on per
code point basis anyway. So I'm quite happy with UTF-8 and plain strings.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de