From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,677965e410b8ed18 X-Google-Attributes: gid103376,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news1.google.com!postnews.google.com!d4g2000prg.googlegroups.com!not-for-mail From: "=?ISO-8859-1?Q?Hibou57_(Yannick_Duch=EAne)?=" Newsgroups: comp.lang.ada Subject: Re: KC normalization form for text Date: Tue, 26 Feb 2008 16:14:58 -0800 (PST) Organization: http://groups.google.com Message-ID: <12ac9416-6e2c-4e94-9f34-58d36272b353@d4g2000prg.googlegroups.com> References: <2c2354d7-b427-4420-8161-7da417e92505@34g2000hsz.googlegroups.com> NNTP-Posting-Host: 86.66.190.114 Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Trace: posting.google.com 1204071298 30816 127.0.0.1 (27 Feb 2008 00:14:58 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Wed, 27 Feb 2008 00:14:58 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: d4g2000prg.googlegroups.com; posting-host=86.66.190.114; posting-account=vrfdLAoAAAAauX_3XwyXEwXCWN3A1l8D User-Agent: G2/1.0 X-HTTP-UserAgent: Opera/9.23 (Windows NT 5.1; U; fr),gzip(gfe),gzip(gfe) Xref: g2news1.google.com comp.lang.ada:20113 Date: 2008-02-26T16:14:58-08:00 List-Id: > Yes, of course it applies. The reason for the wording is that some Unicode > documents insist that programs that are not normalized are dangerous, and > strongly recommend that everything be normalized. I know about it : some characters and they historically compatible forms may bring the reader into confusion. For this reason I first that this was to applied on identifiers only. > Initially, we required > that the program be converted by the compiler into a normalized form before > processing. But such a conversion has its own problems Yes, they will be : 1) string concatenation will most of times not behave as expected. 2) character literals may not be valid character literal, beceause a compatibility decomposition may create two characters instead of one, and this stay even after canonical recomposition. > I would suspect that most compilers simply don't care about unnormalized > programs, and everything will work fine (without any normalization being > applied, nor any rejection). But the rule allows a compilers especially > worried about security to do normalization and/or code rejection. I think the best is to reject any text which contains unormalized identifier. For literal, this depend on the end usage of those literals. Well, when I think about it, this is true that a unormalized string literal may be usafe as well, if it is to be used in system command (as an exemple). But the restriction is heavy if strings are to be used for display or else (but for identifiers, this should be always rejected I think, beceause a convertion will be any way confusing for the user). > P.S. If you want a more definitive answer, you have to ask the ARG by > sending a formal question to ---@---. Ok, I will ask the question there. Thanks for these comments Yannick