From: "J-P. Rosen" <rosen@adalog.fr>
Subject: Re: GNAT vs UTF-8 source file names
Date: Fri, 7 Jul 2017 10:19:53 +0200
Date: 2017-07-07T10:19:53+02:00 [thread overview]
Message-ID: <ojng04$q8c$1@dont-email.me> (raw)
In-Reply-To: <c65d0a6b-8dbb-4222-936f-838438e8d5bd@googlegroups.com>
Le 06/07/2017 à 17:18, Shark8 a écrit :
> I'm not saying it isn't complicated; I'm saying that it could, and
> should, have been done better.
I'm willing to accept these kinds of statement only from people who
participated in the design...
> Instead we get a bizarre
> Frankenstein's-monster of techniques where some character-glyphs are
> precomposed (with duplicates across multiple languages) and
> Zalgo-script is a thing. (see: https://eeemo.net/ )
Yes, representation of characters is not unique. It's a compromise
between compacity, compatibility, exhaustivity...
> Not only that, but there's the problem of strings; instead of doing
> something sensible ("but wasteful"*) by designing a "multilanguage
> string" that partitioned strings by language. Ex:
This is total confusion. Unicode is about coded sets and encodings, it
has nothing to do with languages and internationalization.
>> The unifying principle is the normalization forms. The fact that
>> there are several normalization forms comes from the difference
>> between human and computer needs.
>
> Perhaps so, but there ought to be a way to identify such a context
> rather than just throwing these normalized forms in the UTF-string
> blender, shrugging, and handing it off to the programmers as "not my
> problem".
Another confusion: normalization forms have nothing to do with encodings
(UTF or not). Normalization provides a unique representation of
composite characters that may be represented in several ways.
> I mean as a counter-example ASN.1 has normalizing encodings like DER
> and CER, but these are (a) usually distinguished by being defined by
> their particular encoding, and when they aren't (b) are proper
> subsets of BER. [Much like subtypes in Ada and how we can use Natural
> & Positive for better describing our problem, but can use Integer
> when needed (ie foreign interfacing where the constraint might not be
> guarenteed).]
I don't follow you here. ASN.1 is a representation of structured data,
and AFAIU does not specify which coded set is used.
--
J-P. Rosen
Adalog
2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX
Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00
http://www.adalog.fr
next prev parent reply other threads:[~2017-07-07 8:19 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-30 17:10 GNAT vs UTF-8 source file names Simon Wright
2017-06-17 17:20 ` Simon Wright
2017-06-27 13:22 ` Jacob Sparre Andersen
2017-06-27 21:45 ` Niklas Holsti
2017-06-28 5:05 ` G.B.
2017-07-04 13:57 ` Simon Wright
2017-07-04 17:30 ` Shark8
2017-07-04 18:08 ` Dennis Lee Bieber
2017-07-05 5:25 ` J-P. Rosen
2017-07-06 15:18 ` Shark8
2017-07-07 8:19 ` J-P. Rosen [this message]
2017-07-05 5:21 ` J-P. Rosen
2017-07-05 9:47 ` Simon Wright
2017-07-05 11:20 ` J-P. Rosen
2017-07-05 18:42 ` Randy Brukardt
2017-07-06 18:43 ` Simon Wright
2017-07-07 8:26 ` J-P. Rosen
2017-07-07 11:01 ` Simon Wright
2017-07-07 11:49 ` Jacob Sparre Andersen
2017-07-07 19:44 ` Randy Brukardt
2017-07-07 19:40 ` Randy Brukardt
2017-07-07 21:02 ` Simon Wright
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox