From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,88ed72d98e6b3457
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2003-10-05 16:57:38 PST
Path: 
 archiver1.google.com!news2.google.com!newsfeed.stanford.edu!headwall.stanford.edu!newshub.sdsu.edu!elnk-nf2-pas!newsfeed.earthlink.net!wn14feed!worldnet.att.net!204.127.198.203!attbi_feed3!attbi_feed4!attbi.com!sccrnsc02.POSTED!not-for-mail
Message-ID: <3F80AFE4.4010901@comcast.net>
From: "Robert I. Eachus" <rieachus@comcast.net>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US;
 rv:1.0.2) Gecko/20021120 Netscape/7.01
X-Accept-Language: en-us, en
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: Re: Standard Library Interest?
References: <TdJfb.5879$RU4.57294@newsfep4-glfd.server.ntli.net>
 <3F7F760E.2020901@comcast.net> <blpl4g$ic7$1@a1-hrz.uni-duisburg.de>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
NNTP-Posting-Host: 24.34.139.183
X-Complaints-To: abuse@comcast.net
X-Trace: sccrnsc02 1065398257 24.34.139.183 (Sun, 05 Oct 2003 23:57:37 GMT)
NNTP-Posting-Date: Sun, 05 Oct 2003 23:57:37 GMT
Organization: Comcast Online
Date: Sun, 05 Oct 2003 23:57:37 GMT
Xref: archiver1.google.com comp.lang.ada:284
Date: 2003-10-05T23:57:37+00:00
List-Id: <comp.lang.ada>

Georg Bauhaus wrote:

> How can this be when Unicode has more than 65536 code positions?
> (Assuming I wanted to use full Unicode, I guess I will have to rely
> on Implementation Permissions to provide me with a corresponding
> character type?)

If you are that familiar with Unicode... Ada Wide_Character corresponds 
to the ISO 10646 BMP, and to Unicode. ISO 10646 defines a 32-bit mapping 
for code points, broken into octets, and further into 16-bit (two octet) 
planes.  It also defines three encoding mechanisms, UTF-8, UTF-16, and 
UTF-32.  Unicode corresponds to UTF-16, where most 16-bit encodings map 
to single code points, and encodings in the surrogates area are used to 
encode code points from other planes.  These encodings consist of a high 
surrogate from the range 16#DC00# to 16#DFFF# followed by a low 
surrogate from the range 16#D800# to 16#DBFF#.  Technically Ada encodes 
the BMP and will not damage any embedded surrogates, but surrogate pairs 
will not be counted as a single code point.

If anyone wants to use "full" Unicode in Ada, the more appropriate 
approach would be to add support for UTF-32 as Wide_Wide_Character.  But 
in practice, there would be no difference between Ada's treatment of 
Wide_Character as the BMP or an encoding using UTF-16, because of the 
way Unicode has defined the surrogate characters.  Most of the 'missing' 
Unicode support has to do with display rules that apply to printers not 
to strings.

If you want to write a subprogram to determine the length of a 
Wide_Character string in characters, you can't do it without adopting 
specific language rules on what is or is not a character.  For  example 
Hangul (a form of Korean) combines up to three code points into a single 
Hangul character, which represents a syllable.  Or Vietnamese, which can 
have several accent marks on a single (Latin) character.

It is certainly possible to have a (written language dependent) set of 
categorization routines that correctly sorts Wide_Character 
representations into appropriate categories for that language. 
(Character, symbol, numeric digit, etc.)  But I would hesitate to even 
try to come up with a language independent mapping.  For example Pi is a 
mathematical symbol in English, but a capital letter in Greek.
-- 
                                     Robert I. Eachus

"Quality is the Buddha. Quality is scientific reality. Quality is the 
goal of Art. It remains to work these concepts into a practical, 
down-to-earth context, and for this there is nothing more practical or 
down-to-earth than what I have been talking about all along...the repair 
of an old motorcycle."  -- from Zen and the Art of Motorcycle 
Maintenance by Robert Pirsig