From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,43ab55a75a8b5d1 X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!news4.google.com!news.glorb.com!news-out1.kabelfoon.nl!newsfeed.kabelfoon.nl!bandi.nntp.kabelfoon.nl!newsfeed.freenet.de!news.netcologne.de!newsfeed-fusi2.netcologne.de!newsfeed01.chello.at!newsfeed.arcor.de!news.arcor.de!not-for-mail From: "Dmitry A. Kazakov" Subject: Re: System.WCh_Cnv Newsgroups: comp.lang.ada User-Agent: 40tude_Dialog/2.0.15.1 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Reply-To: mailbox@dmitry-kazakov.de Organization: cbb software GmbH References: Date: Tue, 25 Jul 2006 14:21:21 +0200 Message-ID: <1nbqjel4blzuj$.obwkz78gfdph$.dlg@40tude.net> NNTP-Posting-Date: 25 Jul 2006 14:21:21 MEST NNTP-Posting-Host: 75b26032.newsread4.arcor-online.net X-Trace: DXC=D3m=ii\?KZLRG:gi]CKmZG:ejgIfPPldDjW\KbG]kaMHea\9g\;7NmEI[l:ilTIH?I[6LHn;2LCVN7enW;^6ZC`DIXm65S@:3>O X-Complaints-To: usenet-abuse@arcor.de Xref: g2news2.google.com comp.lang.ada:5916 Date: 2006-07-25T14:21:21+02:00 List-Id: On Tue, 25 Jul 2006 11:31:08 +0100, Marius Amado-Alves wrote: >>> Actually the Unicode codepoint range is 0 .. 10FFFF and therefore >>> fits in 21 bits. >> >> ... the definition would allow expansion to 31-bits (but no >> further). > > The definition of some particular *encoding* namely UCS-4. Not of the > "character set" range. Character = codepoint. And this stops at > 10FFFF. And it will not be extended. IIRC both Organizations went on > record on this. Silly maybe, but not per se. It has to do with > variable length encodings. It facilitates search and verification. > Now these encodings may be a bit silly, yes. > > I have been sketching a highly simplified, short, clear, logical, > understandable, usable, no nonsense, package for Unicode. I have not > been making much progress for several reasons. If someone wants to > join that would be great. The first lines of the spec follow. > > -- Unico : no nonsense Unicode support for Ada > -- (C) 2006 Marius Amado Alves > > with Ada.Containers.Vectors; > with Ada.Streams; > > package Unico is > > type Character is range 0 .. 16#10FFFF#; > for Character'Size use 24; > > procedure Write > (Stream : access Ada.Streams.Root_Stream_Type'Class; > Item : in Character); > > procedure Read > (Stream : access Ada.Streams.Root_Stream_Type'Class; > Item : out Character); [...] But how can you read/write it ignoring encoding? As for Character = code point idea, I think it was a wrong from its very start in the form of Wide_Character. The advantages of being able to index each individual code point in a string are minor comparing with the mess it brings with. These become almost invisible if one takes into account that places where that might be needed, like text rendering, don't work on per code point basis anyway. So I'm quite happy with UTF-8 and plain strings. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de