From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00
	autolearn=unavailable autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: "J-P. Rosen" <rosen@adalog.fr>
Newsgroups: comp.lang.ada
Subject: Re: GNAT vs UTF-8 source file names
Date: Fri, 7 Jul 2017 10:19:53 +0200
Organization: A noiseless patient Spider
Message-ID: <ojng04$q8c$1@dont-email.me>
References: <lytw55kei5.fsf@pushface.org> <lyefuia5ur.fsf@pushface.org>
 <lyeftw2tlc.fsf@pushface.org>
 <b4c7079c-8c00-4c7f-938f-87f031172923@googlegroups.com>
 <ojht0q$srt$1@dont-email.me>
 <c65d0a6b-8dbb-4222-936f-838438e8d5bd@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 7 Jul 2017 08:16:04 -0000 (UTC)
Injection-Info: mx02.eternal-september.org;
 posting-host="d52e56542d0c212d612845daa3d7c429";
	logging-data="26892"; mail-complaints-to="abuse@eternal-september.org";
	posting-account="U2FsdGVkX1/M0Fep9sZ5TNy/N2wJZgoQ"
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.2.1
In-Reply-To: <c65d0a6b-8dbb-4222-936f-838438e8d5bd@googlegroups.com>
Content-Language: fr
Cancel-Lock: sha1:zg7r8L9aqiJ/maKLBfhREPj0UsY=
Xref: news.eternal-september.org comp.lang.ada:47308
Date: 2017-07-07T10:19:53+02:00
List-Id: <comp.lang.ada>

Le 06/07/2017 à 17:18, Shark8 a écrit :
> I'm not saying it isn't complicated; I'm saying that it could, and
> should, have been done better.
I'm willing to accept these kinds of statement only from people who
participated in the design...

> Instead we get a bizarre
> Frankenstein's-monster of techniques where some character-glyphs are
> precomposed (with duplicates across multiple languages) and
> Zalgo-script is a thing. (see: https://eeemo.net/ )
Yes, representation of characters is not unique. It's a compromise
between compacity, compatibility, exhaustivity...

> Not only that, but there's the problem of strings; instead of doing
> something sensible ("but wasteful"*) by designing a "multilanguage
> string" that partitioned strings by language. Ex:
This is total confusion. Unicode is about coded sets and encodings, it
has nothing to do with languages and internationalization.

>> The unifying principle is the normalization forms. The fact that
>> there are several normalization forms comes from the difference
>> between human and computer needs.
> 
> Perhaps so, but there ought to be a way to identify such a context
> rather than just throwing these normalized forms in the UTF-string
> blender, shrugging, and handing it off to the programmers as "not my
> problem".
Another confusion: normalization forms have nothing to do with encodings
(UTF or not). Normalization provides a unique representation of
composite characters that may be represented in several ways.

> I mean as a counter-example ASN.1 has normalizing encodings like DER
> and CER, but these are (a) usually distinguished by being defined by
> their particular encoding, and when they aren't (b) are proper
> subsets of BER. [Much like subtypes in Ada and how we can use Natural
> & Positive for better describing our problem, but can use Integer
> when needed (ie foreign interfacing where the constraint might not be
> guarenteed).]
I don't follow you here. ASN.1 is a representation of structured data,
and AFAIU does not specify which coded set is used.


-- 
J-P. Rosen
Adalog
2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX
Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00
http://www.adalog.fr