From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,bcb6f63419c2a56b
X-Google-Attributes: gid103376,public
Path: 
 controlnews3.google.com!news2.google.com!news.maxwell.syr.edu!newsfeed.icl.net!newsfeed.fjserv.net!feed.news.tiscali.de!newsfeed.stueberl.de!proxad.net!news.tiscali.fr!foorum!not-for-mail
From: Ludovic Brenta <ludovic.brenta@insalien.org>
Newsgroups: comp.lang.ada
Subject: Re: Supporting full Unicode
Date: 12 May 2004 10:57:25 GMT
Message-ID: <2004512-125725-433248@foorum.com>
References: <9j8oc.16324$V97.13312@newsread1.news.pas.earthlink.net>
 <2004512-94456-948110@foorum.com> <pan.2004.05.12.09.26.57.126499@email.ro>
 <dQmoc.58891$mU6.238072@newsb.telia.net>
NNTP-Posting-Host: 212.190.145.10
NNTP-Posting-Date: 12 May 2004 10:57:25 GMT
X-Complaints-To: abuse@foorum.fr
X-POSTER: foorum.com
X-Foorum_user_id: 
X-Foorum_user_tmp_id: 200454-93313-705548-212.190.145.10-foorum
X-Originating-User: 212.190.145.10
X-Newsreader: Foorum
Xref: controlnews3.google.com comp.lang.ada:476
Date: 2004-05-12T10:57:25+00:00
List-Id: <comp.lang.ada>


Bjorn Persson wrote:
> David Starner wrote:
>> they should have defined Wide_Character to be UTF-16 like Java did.
> 
> Keeping in mind that in UTF-16 some characters take two bytes and
> others take four, how do you propose to define that type?

It is true that variable-width encodings such as UTF-16 or UTF-8 are
more difficult to handle than fixed-width encodings like UCS-2 or
UCS-4.  Basically, if you want to do advanced processing of character
data, you may find it easier to first transcode it to UCS-4
(i.e. Wide_Wide_Character, 32 bits wide).

But UTF-8 is gaining momemtum.  Originally intended as an external
encoding only, it is now in use as an internal encoding, too.  I
suppose that it turned out that processing UTF-8 directly is not that
difficult after all.  This is especially true if all you want to do is
localisation of software using gettext; in this case, you can use
UTF-8 as both your internal and external encoding without any trouble.

The Perl regular expression engine, for example, supports UTF-8
strings directly.  I don't know if it transcodes to UTF-4 internally.

-- 
Ludovic Brenta.


-- 
Use our news server 'news.foorum.com' from anywhere.
More details at: http://nnrpinfo.go.foorum.com/