comp.lang.ada
 help / color / mirror / Atom feed
From: "Lawrence D’Oliveiro" <ldo@nz.invalid>
Subject: Re: Ada 202x; 2022; and 2012 and Unicode package (UTF-nn encodings handling)
Date: Wed, 3 Sep 2025 04:10:47 -0000 (UTC)	[thread overview]
Message-ID: <1098f46$u638$2@dont-email.me> (raw)
In-Reply-To: 10981js$rmq1$1@dont-email.me

On Tue, 2 Sep 2025 18:20:09 -0600, Alex // nytpu wrote:

> (This whole mess is because UCS-2 and the UCS-2->UTF-16 transition was a
> hellish mess caused by extreme lack of foresight and it's horrible they
> saddled everyone, including people not using UTF-16, with this crap.

I gather the basic problem was that Unicode was originally going to be a 
fixed-length 16-bit code, and that was that. And so early adopters 
(Windows NT and Java among them), built UCS-2 right into their DNA.

Until Unicode 2.0, I believe it was, where they went “on second thought, 
let’s go beyond our original brief and start including all kinds of other 
things as well” ... and UCS-2 had to become UTF-16 ...

> UTF-16 and its surrogate pairs is also what's responsible for the
> maximum scalar value being 0x0010_FFFF instead of 0xFFFF_FFFF even
> though UTF-32 and UTF-8 and even goddamn UTF-EBCDIC and the weird
> encoding the Chinese government came up with can all trivially encode
> full 32-bit values)

I wondered about that limit ...

> Plus conveniently Ada doesn't have routines for normalization, but can't
> hold that against it since neither does any other programming language
> because the lookup tables required are like 20 MiB even when optimized
> for space.

I think Python has them
<https://docs.python.org/3/library/unicodedata.html>. But then, on 
platforms with decent package management, that data can be shared with 
other installed packages that require it as well.

> Plus you shouldn't normalize text other than performing actions like
> substring matching, equality tests, or sorting---and even if you
> normalize when performing those, *when possible* you should store the
> unnormalized original for display/output afterwards.

I thought it was always safe to store decomposed versions of everything.

  reply	other threads:[~2025-09-03  4:10 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-20 21:38 Ada 2012 and Unicode package (UTF-nn encodings handling) Yannick Duchêne (Hibou57)
2010-08-20 21:41 ` Yannick Duchêne (Hibou57)
2010-08-21  6:21 ` Dmitry A. Kazakov
2010-08-21  7:01 ` J-P. Rosen
2010-08-21  8:12   ` Yannick Duchêne (Hibou57)
2010-08-22 18:51     ` J-P. Rosen
2010-08-22 19:48       ` Georg Bauhaus
2010-08-22 20:40         ` J-P. Rosen
2010-08-23 10:32           ` Georg Bauhaus
2010-08-23 22:28 ` Randy Brukardt
2025-08-31 17:39 ` Ada 202x; 2022; and " Nicolas Paul Colin de Glocester
2025-08-31 21:23   ` Kevin Chadwick
2025-08-31 21:27     ` Nicolas Paul Colin de Glocester
2025-09-02 16:01   ` Alex // nytpu
2025-09-02 17:40     ` Nicolas Paul Colin de Glocester
2025-09-02 18:49       ` Keith Thompson
2025-09-02 19:27         ` Nicolas Paul Colin de Glocester
2025-09-02 20:02           ` Keith Thompson
2025-09-02 17:42     ` Nicolas Paul Colin de Glocester
2025-09-02 19:15       ` Alex // nytpu
2025-09-02 19:50         ` Nicolas Paul Colin de Glocester
2025-09-02 18:08     ` Dmitry A. Kazakov
2025-09-02 19:13       ` Alex // nytpu
2025-09-02 22:56     ` Lawrence D’Oliveiro
2025-09-03  0:20       ` Alex // nytpu
2025-09-03  4:10         ` Lawrence D’Oliveiro [this message]
2025-09-03 17:25           ` Alex // nytpu
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox