* KC normalization form for text
@ 2008-02-26 4:03 Hibou57 (Yannick Duchêne)
2008-02-26 21:08 ` Randy Brukardt
0 siblings, 1 reply; 3+ messages in thread
From: Hibou57 (Yannick Duchêne) @ 2008-02-26 4:03 UTC (permalink / raw)
LRM 2.1 4.1/2 :
> The semantics of an Ada program whose text is not in Normalization Form KC (as defined by section
> 24 of ISO/IEC 10646:2003) is implementation defined.
Does it really apply as well on character literals, string literals
and comments ?
If it does, this is very restrictive (mainly for character and string
literals).
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: KC normalization form for text 2008-02-26 4:03 KC normalization form for text Hibou57 (Yannick Duchêne) @ 2008-02-26 21:08 ` Randy Brukardt 2008-02-27 0:14 ` Hibou57 (Yannick Duchêne) 0 siblings, 1 reply; 3+ messages in thread From: Randy Brukardt @ 2008-02-26 21:08 UTC (permalink / raw) [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 1835 bytes --] "Hibou57 (Yannick Duch�ne)" <yannick_duchene@yahoo.fr> wrote in message news:2c2354d7-b427-4420-8161-7da417e92505@34g2000hsz.googlegroups.com... > LRM 2.1 4.1/2 : > > The semantics of an Ada program whose text is not in Normalization Form KC (as defined by section > > 24 of ISO/IEC 10646:2003) is implementation defined. > > Does it really apply as well on character literals, string literals > and comments ? > > If it does, this is very restrictive (mainly for character and string > literals). Yes, of course it applies. The reason for the wording is that some Unicode documents insist that programs that are not normalized are dangerous, and strongly recommend that everything be normalized. Initially, we required that the program be converted by the compiler into a normalized form before processing. But such a conversion has its own problems (and would be a lot of work for compiler implementers), so in the end we decided to cop-out with the statement you see above. I would suspect that most compilers simply don't care about unnormalized programs, and everything will work fine (without any normalization being applied, nor any rejection). But the rule allows a compilers especially worried about security to do normalization and/or code rejection. In any case, what a compiler does is supposed to be documented. (That's the difference between "implementation-defined" and "unspecified" in the Ada standard.) So you can depend on whatever the compiler does, you just can't *assume* that the code will be portable. Randy. P.S. If you want a more definitive answer, you have to ask the ARG by sending a formal question to Ada-Comment@Ada-Auth.org. I suggest joining the mailing list if you do that, so that you see any replies (especially requests for more information). ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: KC normalization form for text 2008-02-26 21:08 ` Randy Brukardt @ 2008-02-27 0:14 ` Hibou57 (Yannick Duchêne) 0 siblings, 0 replies; 3+ messages in thread From: Hibou57 (Yannick Duchêne) @ 2008-02-27 0:14 UTC (permalink / raw) > Yes, of course it applies. The reason for the wording is that some Unicode > documents insist that programs that are not normalized are dangerous, and > strongly recommend that everything be normalized. I know about it : some characters and they historically compatible forms may bring the reader into confusion. For this reason I first that this was to applied on identifiers only. > Initially, we required > that the program be converted by the compiler into a normalized form before > processing. But such a conversion has its own problems Yes, they will be : 1) string concatenation will most of times not behave as expected. 2) character literals may not be valid character literal, beceause a compatibility decomposition may create two characters instead of one, and this stay even after canonical recomposition. > I would suspect that most compilers simply don't care about unnormalized > programs, and everything will work fine (without any normalization being > applied, nor any rejection). But the rule allows a compilers especially > worried about security to do normalization and/or code rejection. I think the best is to reject any text which contains unormalized identifier. For literal, this depend on the end usage of those literals. Well, when I think about it, this is true that a unormalized string literal may be usafe as well, if it is to be used in system command (as an exemple). But the restriction is heavy if strings are to be used for display or else (but for identifiers, this should be always rejected I think, beceause a convertion will be any way confusing for the user). > P.S. If you want a more definitive answer, you have to ask the ARG by > sending a formal question to ---@---. Ok, I will ask the question there. Thanks for these comments Yannick ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2008-02-27 0:14 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-02-26 4:03 KC normalization form for text Hibou57 (Yannick Duchêne) 2008-02-26 21:08 ` Randy Brukardt 2008-02-27 0:14 ` Hibou57 (Yannick Duchêne)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox