comp.lang.ada
 help / color / mirror / Atom feed
From: David Starner <dvdeug@email.ro>
Subject: Re: Supporting full Unicode
Date: Wed, 12 May 2004 19:25:20 GMT
Date: 2004-05-12T19:25:20+00:00	[thread overview]
Message-ID: <pan.2004.05.12.19.10.16.123505@email.ro> (raw)
In-Reply-To: mailman.115.1084354437.313.comp.lang.ada@ada-france.org

> Indeed UTF-8 seems to rule. Probably because there are more ready-to-use low
> level tools for 8-bit characters. Actually the proper tools for Unicode
> should be 24-bit based. An ugly fact about Unicode is that the code space is
> 24-bit and the encodings are all but 24 (8, 16, 32).

Why is that ugly? UTF-16 or UTF-8 is virtually always going to be smaller,
unless most of your text is in an obscure dead tongue, which is unlikely
to found in quantities that need compression. It's not going to be faster
to process, unless you're running on some terribly obscure architecture
that natively handles 24 bit words.

As someone else pointed out, it's not 24, it's roughly 20.1. 

As for compression, a comparison of compression formats on various Unicode
encodings was made[1], and it was found that most of the difference
between encodings was wiped out by compression.

[1] http://www.cs.fit.edu/~ryan/compress/



  parent reply	other threads:[~2004-05-12 19:25 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-05-11 17:45 Supporting full Unicode Brian Catlin
2004-05-12  7:44 ` Ludovic Brenta
2004-05-12  8:23   ` Marius Amado Alves
2004-05-12 10:43     ` Martin Krischik
2004-05-12 14:56       ` Björn Persson
2004-05-12 19:09       ` David Starner
2004-05-12 19:25     ` David Starner [this message]
2004-05-12  9:41   ` David Starner
2004-05-12 10:16     ` Björn Persson
2004-05-12 10:57       ` Ludovic Brenta
2004-05-12 14:53         ` Björn Persson
2004-05-12 18:55           ` David Starner
2004-05-12  9:30 ` Martin Krischik
2004-05-13  1:15 ` Randy Brukardt
2004-05-13 17:58   ` Brian Catlin
2004-05-13 19:42     ` Randy Brukardt
2004-05-14  8:40       ` Andersen Jacob Sparre
2004-05-14 20:20         ` Randy Brukardt
2004-05-14  4:00 ` Vadim Godunko
2004-05-14 17:51   ` Brian Catlin
  -- strict thread matches above, loose matches on Subject: below --
2004-05-12 12:40 amado.alves
2004-05-12 14:34 ` Martin Krischik
2004-05-12 18:24   ` David Starner
2004-05-12 20:04   ` Florian Weimer
2004-05-12 14:12 amado.alves
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox