comp.lang.ada
 help / color / mirror / Atom feed
From: Ludovic Brenta <ludovic.brenta@insalien.org>
Subject: Re: Supporting full Unicode
Date: 12 May 2004 10:57:25 GMT
Date: 2004-05-12T10:57:25+00:00	[thread overview]
Message-ID: <2004512-125725-433248@foorum.com> (raw)
In-Reply-To: dQmoc.58891$mU6.238072@newsb.telia.net


Bjorn Persson wrote:
> David Starner wrote:
>> they should have defined Wide_Character to be UTF-16 like Java did.
> 
> Keeping in mind that in UTF-16 some characters take two bytes and
> others take four, how do you propose to define that type?

It is true that variable-width encodings such as UTF-16 or UTF-8 are
more difficult to handle than fixed-width encodings like UCS-2 or
UCS-4.  Basically, if you want to do advanced processing of character
data, you may find it easier to first transcode it to UCS-4
(i.e. Wide_Wide_Character, 32 bits wide).

But UTF-8 is gaining momemtum.  Originally intended as an external
encoding only, it is now in use as an internal encoding, too.  I
suppose that it turned out that processing UTF-8 directly is not that
difficult after all.  This is especially true if all you want to do is
localisation of software using gettext; in this case, you can use
UTF-8 as both your internal and external encoding without any trouble.

The Perl regular expression engine, for example, supports UTF-8
strings directly.  I don't know if it transcodes to UTF-4 internally.

-- 
Ludovic Brenta.


-- 
Use our news server 'news.foorum.com' from anywhere.
More details at: http://nnrpinfo.go.foorum.com/



  reply	other threads:[~2004-05-12 10:57 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-05-11 17:45 Supporting full Unicode Brian Catlin
2004-05-12  7:44 ` Ludovic Brenta
2004-05-12  8:23   ` Marius Amado Alves
2004-05-12 10:43     ` Martin Krischik
2004-05-12 14:56       ` Björn Persson
2004-05-12 19:09       ` David Starner
2004-05-12 19:25     ` David Starner
2004-05-12  9:41   ` David Starner
2004-05-12 10:16     ` Björn Persson
2004-05-12 10:57       ` Ludovic Brenta [this message]
2004-05-12 14:53         ` Björn Persson
2004-05-12 18:55           ` David Starner
2004-05-12  9:30 ` Martin Krischik
2004-05-13  1:15 ` Randy Brukardt
2004-05-13 17:58   ` Brian Catlin
2004-05-13 19:42     ` Randy Brukardt
2004-05-14  8:40       ` Andersen Jacob Sparre
2004-05-14 20:20         ` Randy Brukardt
2004-05-14  4:00 ` Vadim Godunko
2004-05-14 17:51   ` Brian Catlin
  -- strict thread matches above, loose matches on Subject: below --
2004-05-12 12:40 amado.alves
2004-05-12 14:34 ` Martin Krischik
2004-05-12 18:24   ` David Starner
2004-05-12 20:04   ` Florian Weimer
2004-05-12 14:12 amado.alves
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox