KC normalization form for text

comp.lang.ada
 help / color / mirror / Atom feed

* KC normalization form for text
@ 2008-02-26  4:03 Hibou57 (Yannick Duchêne)
  2008-02-26 21:08 ` Randy Brukardt
  0 siblings, 1 reply; 3+ messages in thread
From: Hibou57 (Yannick Duchêne) @ 2008-02-26  4:03 UTC (permalink / raw)


LRM 2.1 4.1/2 :
> The semantics of an Ada program whose text is not in Normalization Form KC (as defined by section
> 24 of ISO/IEC 10646:2003) is implementation defined.

Does it really apply as well on character literals, string literals
and comments ?

If it does, this is very restrictive (mainly for character and string
literals).




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: KC normalization form for text
  2008-02-26  4:03 KC normalization form for text Hibou57 (Yannick Duchêne)
@ 2008-02-26 21:08 ` Randy Brukardt
  2008-02-27  0:14   ` Hibou57 (Yannick Duchêne)
  0 siblings, 1 reply; 3+ messages in thread
From: Randy Brukardt @ 2008-02-26 21:08 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1835 bytes --]

"Hibou57 (Yannick Duch�ne)" <yannick_duchene@yahoo.fr> wrote in message
news:2c2354d7-b427-4420-8161-7da417e92505@34g2000hsz.googlegroups.com...
> LRM 2.1 4.1/2 :
> > The semantics of an Ada program whose text is not in Normalization Form
KC (as defined by section
> > 24 of ISO/IEC 10646:2003) is implementation defined.
>
> Does it really apply as well on character literals, string literals
> and comments ?
>
> If it does, this is very restrictive (mainly for character and string
> literals).

Yes, of course it applies. The reason for the wording is that some Unicode
documents insist that programs that are not normalized are dangerous, and
strongly recommend that everything be normalized. Initially, we required
that the program be converted by the compiler into a normalized form before
processing. But such a conversion has its own problems (and would be a lot
of work for compiler implementers), so in the end we decided to cop-out with
the statement you see above.

I would suspect that most compilers simply don't care about unnormalized
programs, and everything will work fine (without any normalization being
applied, nor any rejection). But the rule allows a compilers especially
worried about security to do normalization and/or code rejection.

In any case, what a compiler does is supposed to be documented. (That's the
difference between "implementation-defined" and "unspecified" in the Ada
standard.) So you can depend on whatever the compiler does, you just can't
*assume* that the code will be portable.

                                          Randy.

P.S. If you want a more definitive answer, you have to ask the ARG by
sending a formal question to Ada-Comment@Ada-Auth.org. I suggest joining the
mailing list if you do that, so that you see any replies (especially
requests for more information).






^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: KC normalization form for text
  2008-02-26 21:08 ` Randy Brukardt
@ 2008-02-27  0:14   ` Hibou57 (Yannick Duchêne)
  0 siblings, 0 replies; 3+ messages in thread
From: Hibou57 (Yannick Duchêne) @ 2008-02-27  0:14 UTC (permalink / raw)


> Yes, of course it applies. The reason for the wording is that some Unicode
> documents insist that programs that are not normalized are dangerous, and
> strongly recommend that everything be normalized.
I know about it : some characters and they historically compatible
forms may bring the reader into confusion. For this reason I first
that this was to applied on identifiers only.

> Initially, we required
> that the program be converted by the compiler into a normalized form before
> processing. But such a conversion has its own problems
Yes, they will be :
1) string concatenation will most of times not behave as expected.
2) character literals may not be valid character literal, beceause a
compatibility decomposition may create two characters instead of one,
and this stay even after canonical recomposition.

> I would suspect that most compilers simply don't care about unnormalized
> programs, and everything will work fine (without any normalization being
> applied, nor any rejection). But the rule allows a compilers especially
> worried about security to do normalization and/or code rejection.
I think the best is to reject any text which contains unormalized
identifier.

For literal, this depend on the end usage of those literals. Well,
when I think about it, this is true that a unormalized string literal
may be usafe as well, if it is to be used in system command (as an
exemple). But the restriction is heavy if strings are to be used for
display or else (but for identifiers, this should be always rejected I
think, beceause a convertion will be any way confusing for the user).

> P.S. If you want a more definitive answer, you have to ask the ARG by
> sending a formal question to ---@---.
Ok, I will ask the question there.

Thanks for these comments

Yannick



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-02-27  0:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-26  4:03 KC normalization form for text Hibou57 (Yannick Duchêne)
2008-02-26 21:08 ` Randy Brukardt
2008-02-27  0:14   ` Hibou57 (Yannick Duchêne)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox