From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.4
X-Google-Thread: 103376,677965e410b8ed18
X-Google-Attributes: gid103376,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
Path: 
 g2news1.google.com!postnews.google.com!d4g2000prg.googlegroups.com!not-for-mail
From: "=?ISO-8859-1?Q?Hibou57_(Yannick_Duch=EAne)?="
 <yannick_duchene@yahoo.fr>
Newsgroups: comp.lang.ada
Subject: Re: KC normalization form for text
Date: Tue, 26 Feb 2008 16:14:58 -0800 (PST)
Organization: http://groups.google.com
Message-ID: <12ac9416-6e2c-4e94-9f34-58d36272b353@d4g2000prg.googlegroups.com>
References: <2c2354d7-b427-4420-8161-7da417e92505@34g2000hsz.googlegroups.com>
	<fq1v70$jin$1@jacob-sparre.dk>
NNTP-Posting-Host: 86.66.190.114
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Trace: posting.google.com 1204071298 30816 127.0.0.1 (27 Feb 2008 00:14:58
 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Wed, 27 Feb 2008 00:14:58 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: d4g2000prg.googlegroups.com; posting-host=86.66.190.114;
	posting-account=vrfdLAoAAAAauX_3XwyXEwXCWN3A1l8D
User-Agent: G2/1.0
X-HTTP-UserAgent: Opera/9.23 (Windows NT 5.1; U; fr),gzip(gfe),gzip(gfe)
Xref: g2news1.google.com comp.lang.ada:20113
Date: 2008-02-26T16:14:58-08:00
List-Id: <comp.lang.ada>

> Yes, of course it applies. The reason for the wording is that some Unicode
> documents insist that programs that are not normalized are dangerous, and
> strongly recommend that everything be normalized.
I know about it : some characters and they historically compatible
forms may bring the reader into confusion. For this reason I first
that this was to applied on identifiers only.

> Initially, we required
> that the program be converted by the compiler into a normalized form before
> processing. But such a conversion has its own problems
Yes, they will be :
1) string concatenation will most of times not behave as expected.
2) character literals may not be valid character literal, beceause a
compatibility decomposition may create two characters instead of one,
and this stay even after canonical recomposition.

> I would suspect that most compilers simply don't care about unnormalized
> programs, and everything will work fine (without any normalization being
> applied, nor any rejection). But the rule allows a compilers especially
> worried about security to do normalization and/or code rejection.
I think the best is to reject any text which contains unormalized
identifier.

For literal, this depend on the end usage of those literals. Well,
when I think about it, this is true that a unormalized string literal
may be usafe as well, if it is to be used in system command (as an
exemple). But the restriction is heavy if strings are to be used for
display or else (but for identifiers, this should be always rejected I
think, beceause a convertion will be any way confusing for the user).

> P.S. If you want a more definitive answer, you have to ask the ARG by
> sending a formal question to ---@---.
Ok, I will ask the question there.

Thanks for these comments

Yannick