From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "J-P. Rosen" Newsgroups: comp.lang.ada Subject: Re: GNAT vs UTF-8 source file names Date: Fri, 7 Jul 2017 10:19:53 +0200 Organization: A noiseless patient Spider Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Date: Fri, 7 Jul 2017 08:16:04 -0000 (UTC) Injection-Info: mx02.eternal-september.org; posting-host="d52e56542d0c212d612845daa3d7c429"; logging-data="26892"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/M0Fep9sZ5TNy/N2wJZgoQ" User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 In-Reply-To: Content-Language: fr Cancel-Lock: sha1:zg7r8L9aqiJ/maKLBfhREPj0UsY= Xref: news.eternal-september.org comp.lang.ada:47308 Date: 2017-07-07T10:19:53+02:00 List-Id: Le 06/07/2017 à 17:18, Shark8 a écrit : > I'm not saying it isn't complicated; I'm saying that it could, and > should, have been done better. I'm willing to accept these kinds of statement only from people who participated in the design... > Instead we get a bizarre > Frankenstein's-monster of techniques where some character-glyphs are > precomposed (with duplicates across multiple languages) and > Zalgo-script is a thing. (see: https://eeemo.net/ ) Yes, representation of characters is not unique. It's a compromise between compacity, compatibility, exhaustivity... > Not only that, but there's the problem of strings; instead of doing > something sensible ("but wasteful"*) by designing a "multilanguage > string" that partitioned strings by language. Ex: This is total confusion. Unicode is about coded sets and encodings, it has nothing to do with languages and internationalization. >> The unifying principle is the normalization forms. The fact that >> there are several normalization forms comes from the difference >> between human and computer needs. > > Perhaps so, but there ought to be a way to identify such a context > rather than just throwing these normalized forms in the UTF-string > blender, shrugging, and handing it off to the programmers as "not my > problem". Another confusion: normalization forms have nothing to do with encodings (UTF or not). Normalization provides a unique representation of composite characters that may be represented in several ways. > I mean as a counter-example ASN.1 has normalizing encodings like DER > and CER, but these are (a) usually distinguished by being defined by > their particular encoding, and when they aren't (b) are proper > subsets of BER. [Much like subtypes in Ada and how we can use Natural > & Positive for better describing our problem, but can use Integer > when needed (ie foreign interfacing where the constraint might not be > guarenteed).] I don't follow you here. ASN.1 is a representation of structured data, and AFAIU does not specify which coded set is used. -- J-P. Rosen Adalog 2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00 http://www.adalog.fr