From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,a50a3c40267219cc
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2001-10-16 13:12:53 PST
Path: 
 archiver1.google.com!news1.google.com!newsfeed.stanford.edu!newsfeeds.belnet.be!news.belnet.be!newsfeed00.sul.t-online.de!newsfeed01.sul.t-online.de!t-online.de!newsfeed.germany.net!newsfeed2.easynews.net!easynews.net!news.cid.net!news.enyo.de!news1.enyo.de!not-for-mail
From: Florian Weimer <fw@deneb.enyo.de>
Newsgroups: comp.lang.ada
Subject: Re: Modern languages are case sensitive?
Date: Tue, 16 Oct 2001 22:32:13 +0200
Organization: Enyo's not your organization
Message-ID: <874rozmilu.fsf@deneb.enyo.de>
References: <3105e154.0110150021.32ff5426@posting.google.com>
 <9qeg5r$266$1@trog.dera.gov.uk> <3BCB2E0B.5D7894CD@boeing.com>
 <5ee5b646.0110160342.23b9481c@posting.google.com>
 <87g08jobpp.fsf@deneb.enyo.de> <9qhpq7$8241@news.cis.okstate.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Xref: archiver1.google.com comp.lang.ada:14745
Date: 2001-10-16T22:32:13+02:00
List-Id: <comp.lang.ada>

David Starner <dvdeug@x8b4e53cd.dhcp.okstate.edu> writes:

> On Tue, 16 Oct 2001 17:18:10 +0200, Florian Weimer <fw@deneb.enyo.de> wrote:
> > Of course, this is only the visual presentation. ;-) The actual
>> representation uses alternative representations of ASCII characters
>> (LATIN SMALL LETTER DOTLESS I followed by COMBINING DOT ABOVE) and a
>> ZERO WIDTH SPACE.
>
> Then why didn't you type in the actual Unicode? 

;-)

> Because it would be very hard to?

Actually, using GNU Emacs, Quail, and the proper input method , it's
easy.

> Because it wouldn't have fooled anyone?

Probably true.  There are few applications which treat combining
characters correctly (at least for GNU, I don't know about other
operating systems).

>> Clearly, Unicode is not suitable for identifiers.  There are five
>> different ways to represent a symbol which looks like a capital H!
>
> And in ASCII there are three symbols that look like a vertical line,
> and two that look like a circle, and they're used to confuse things
> all the time.

In most cases, you can use fonts which highlight these differences
(IBM did this with their PC, and has stuck since: "0" has got a dot in
the middle, and "|" a hole).

With Unicode, things are a bit different.  Perhaps you could use
different typefaces for different languages, but at least today, there
are very few complete Unicode fonts, and chances are small that a few
of them are available on a single system.  Or colors can highlight
differences.  Or you could turn off processing of combining characters
and non-spacing space characters when editing source code (reducing
the level of Unicode compatibility).  I don't know which approach is
best, that's why I continue to use an ASCII subset.  Unfortunately,
people are eager to use Unicode identifiers everywhere, even in email
addresses. :-(

> If you speak Russian or Hebrew or Japanese natively and English
> poorly or not at all, Unicode identifiers are much clearer.

For most applications, I've given up using German identifiers.
Most API identifiers are based on English words, and the mixtures
just looks awkward.  Perhaps in some cases, I choose a suboptimal
identifier which unwanted connotations or miss the best one because
the word is not in my active vocabulary, but even native speakers make
such mistakes from time to time.

> I've just added to the list of things to add to GNAT in my Copious Free
> Time the option to restrict Unicode identifiers to the Latin script, as
> that would solve most of your problems.

I think a stronger restriction is already in place; only characters in
Row 00 of the Basic Multilingual Plane are allowed (which corresponds
to the MIME charset known as ISO-8859-1.)  I haven't checked the
non-standard GNAT modes, however.