"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message
news:1a9k0vk46bqrq.1cx6cdld0wd9f$.dlg@40tude.net...
> On Fri, 29 Dec 2006 20:25:28 -0600, Randy Brukardt wrote:
>
> > For what it's worth, Ada says that all three of these represent the same
> > identifier. That's not ideal, but it's the best that we can do without
> > dropping into the character handling mess ourselves.
> >
> > This is even more interesting when you consider that there are
alternative
> > spellings for reserved words. For instance "acce�" is identical to
"access".
> > (See 2.3(5.c/2) in the AARM for more examples). We wrestled with that
quite
> > a while before deciding that such identifiers had to be illegal
> > (2.3(5.3/2)); we didn't want them appearing in programs in place of
reserved
> > words.
>
> Yuck. Would "acce?" with Greek beta (?) and "if" with Cyrillic ? in it be
> valid identifiers?

Sure, the upper case of a Greek beta is still a Greek beta, it's not "SS"
(and doesn't look anything like "ss", either). I don't know much about
Cyrillic, so I don't know the answer to that (but I suspect you do).

I would guess that you'll want some external style rules to prevent bogus
mixing of letters from different character sets. That's not any worse that
the style rules for capitalization and indentation that Gnat can enforce.

I've always limited myself to using the characters commonly available on
Windows systems (roughly 680 glyphs), and there needs to be something that
checks for use of letters that won't necessarily display well. But all of
that is outside of the language.

It should be pointed out that one of the reasons for Ada's support of
Unicode is that we had a long discussion of how to support Latin-9 (which
contains the euro symbol). Eventually, we decided that that way lies
madness - at least by using Unicode, there is only one definition to worry
about, rather than a set of them. My only regret is that we didn't find a
way to include real runtime UTF-8 support in the language: it's wasteful to
store everything as 32-bit characters.

                            Randy.