From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,a50a3c40267219cc X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2001-10-16 13:12:53 PST Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!newsfeeds.belnet.be!news.belnet.be!newsfeed00.sul.t-online.de!newsfeed01.sul.t-online.de!t-online.de!newsfeed.germany.net!newsfeed2.easynews.net!easynews.net!news.cid.net!news.enyo.de!news1.enyo.de!not-for-mail From: Florian Weimer Newsgroups: comp.lang.ada Subject: Re: Modern languages are case sensitive? Date: Tue, 16 Oct 2001 22:32:13 +0200 Organization: Enyo's not your organization Message-ID: <874rozmilu.fsf@deneb.enyo.de> References: <3105e154.0110150021.32ff5426@posting.google.com> <9qeg5r$266$1@trog.dera.gov.uk> <3BCB2E0B.5D7894CD@boeing.com> <5ee5b646.0110160342.23b9481c@posting.google.com> <87g08jobpp.fsf@deneb.enyo.de> <9qhpq7$8241@news.cis.okstate.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Xref: archiver1.google.com comp.lang.ada:14745 Date: 2001-10-16T22:32:13+02:00 List-Id: David Starner writes: > On Tue, 16 Oct 2001 17:18:10 +0200, Florian Weimer wrote: > > Of course, this is only the visual presentation. ;-) The actual >> representation uses alternative representations of ASCII characters >> (LATIN SMALL LETTER DOTLESS I followed by COMBINING DOT ABOVE) and a >> ZERO WIDTH SPACE. > > Then why didn't you type in the actual Unicode? ;-) > Because it would be very hard to? Actually, using GNU Emacs, Quail, and the proper input method , it's easy. > Because it wouldn't have fooled anyone? Probably true. There are few applications which treat combining characters correctly (at least for GNU, I don't know about other operating systems). >> Clearly, Unicode is not suitable for identifiers. There are five >> different ways to represent a symbol which looks like a capital H! > > And in ASCII there are three symbols that look like a vertical line, > and two that look like a circle, and they're used to confuse things > all the time. In most cases, you can use fonts which highlight these differences (IBM did this with their PC, and has stuck since: "0" has got a dot in the middle, and "|" a hole). With Unicode, things are a bit different. Perhaps you could use different typefaces for different languages, but at least today, there are very few complete Unicode fonts, and chances are small that a few of them are available on a single system. Or colors can highlight differences. Or you could turn off processing of combining characters and non-spacing space characters when editing source code (reducing the level of Unicode compatibility). I don't know which approach is best, that's why I continue to use an ASCII subset. Unfortunately, people are eager to use Unicode identifiers everywhere, even in email addresses. :-( > If you speak Russian or Hebrew or Japanese natively and English > poorly or not at all, Unicode identifiers are much clearer. For most applications, I've given up using German identifiers. Most API identifiers are based on English words, and the mixtures just looks awkward. Perhaps in some cases, I choose a suboptimal identifier which unwanted connotations or miss the best one because the word is not in my active vocabulary, but even native speakers make such mistakes from time to time. > I've just added to the list of things to add to GNAT in my Copious Free > Time the option to restrict Unicode identifiers to the Latin script, as > that would solve most of your problems. I think a stronger restriction is already in place; only characters in Row 00 of the Basic Multilingual Plane are allowed (which corresponds to the MIME charset known as ISO-8859-1.) I haven't checked the non-standard GNAT modes, however.