comp.lang.ada
 help / color / mirror / Atom feed
From: "J-P. Rosen" <rosen@adalog.fr>
Subject: Re: GNAT vs UTF-8 source file names
Date: Fri, 7 Jul 2017 10:19:53 +0200
Date: 2017-07-07T10:19:53+02:00	[thread overview]
Message-ID: <ojng04$q8c$1@dont-email.me> (raw)
In-Reply-To: <c65d0a6b-8dbb-4222-936f-838438e8d5bd@googlegroups.com>

Le 06/07/2017 à 17:18, Shark8 a écrit :
> I'm not saying it isn't complicated; I'm saying that it could, and
> should, have been done better.
I'm willing to accept these kinds of statement only from people who
participated in the design...

> Instead we get a bizarre
> Frankenstein's-monster of techniques where some character-glyphs are
> precomposed (with duplicates across multiple languages) and
> Zalgo-script is a thing. (see: https://eeemo.net/ )
Yes, representation of characters is not unique. It's a compromise
between compacity, compatibility, exhaustivity...

> Not only that, but there's the problem of strings; instead of doing
> something sensible ("but wasteful"*) by designing a "multilanguage
> string" that partitioned strings by language. Ex:
This is total confusion. Unicode is about coded sets and encodings, it
has nothing to do with languages and internationalization.

>> The unifying principle is the normalization forms. The fact that
>> there are several normalization forms comes from the difference
>> between human and computer needs.
> 
> Perhaps so, but there ought to be a way to identify such a context
> rather than just throwing these normalized forms in the UTF-string
> blender, shrugging, and handing it off to the programmers as "not my
> problem".
Another confusion: normalization forms have nothing to do with encodings
(UTF or not). Normalization provides a unique representation of
composite characters that may be represented in several ways.

> I mean as a counter-example ASN.1 has normalizing encodings like DER
> and CER, but these are (a) usually distinguished by being defined by
> their particular encoding, and when they aren't (b) are proper
> subsets of BER. [Much like subtypes in Ada and how we can use Natural
> & Positive for better describing our problem, but can use Integer
> when needed (ie foreign interfacing where the constraint might not be
> guarenteed).]
I don't follow you here. ASN.1 is a representation of structured data,
and AFAIU does not specify which coded set is used.


-- 
J-P. Rosen
Adalog
2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX
Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00
http://www.adalog.fr

  reply	other threads:[~2017-07-07  8:19 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-30 17:10 GNAT vs UTF-8 source file names Simon Wright
2017-06-17 17:20 ` Simon Wright
2017-06-27 13:22   ` Jacob Sparre Andersen
2017-06-27 21:45     ` Niklas Holsti
2017-06-28  5:05       ` G.B.
2017-07-04 13:57   ` Simon Wright
2017-07-04 17:30     ` Shark8
2017-07-04 18:08       ` Dennis Lee Bieber
2017-07-05  5:25       ` J-P. Rosen
2017-07-06 15:18         ` Shark8
2017-07-07  8:19           ` J-P. Rosen [this message]
2017-07-05  5:21     ` J-P. Rosen
2017-07-05  9:47       ` Simon Wright
2017-07-05 11:20         ` J-P. Rosen
2017-07-05 18:42           ` Randy Brukardt
2017-07-06 18:43           ` Simon Wright
2017-07-07  8:26             ` J-P. Rosen
2017-07-07 11:01               ` Simon Wright
2017-07-07 11:49                 ` Jacob Sparre Andersen
2017-07-07 19:44                   ` Randy Brukardt
2017-07-07 19:40                 ` Randy Brukardt
2017-07-07 21:02                   ` Simon Wright
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox