From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.4
X-Google-Thread: 103376,a84eaf8fb2470909
X-Google-Attributes: gid103376,public
X-Google-Language: ENGLISH,UTF8
Path: 
 g2news2.google.com!news3.google.com!news4.google.com!news.glorb.com!news.doubleSlash.org!open-news-network.org!news.teledata-fn.de!newsfeed.arcor.de!newsspool1.arcor-online.net!news.arcor.de.POSTED!not-for-mail
Newsgroups: comp.lang.ada
Subject: Re: Ada generics
From: Georg Bauhaus <bauhaus@arcor.de>
In-Reply-To: <15jxp8z1iu5fk.1oeihvavjghgg$.dlg@40tude.net>
References: <1166710494.869393.108730@a3g2000cwd.googlegroups.com>
	 <17fe4xfogg7p5.1dcyc5nyc2gsl.dlg@40tude.net>
	 <1166805696.291429.239590@48g2000cwx.googlegroups.com>
	 <wvsqamnzgasf.gfemd82fzfpz$.dlg@40tude.net> <DFQjh.1222$oo4.1118@trndny09>
	 <186qujlcx6rwl.1h6eq4mbdaa5s$.dlg@40tude.net>
	 <q__jh.5046$6Z5.3245@trndny01>
	 <awny47urt66k$.1vuw1blk1ccth$.dlg@40tude.net>
	 <1167150212.165097.289010@73g2000cwn.googlegroups.com>
	 <1qmdvus6du3xu.1n21tzgev46ia$.dlg@40tude.net> <oekkh.5076$6f4.450@trndny08>
	 <lbkjzskf2pv3$.b28bwviakyyb$.dlg@40tude.net>
	 <1167246396.057028.325080@48g2000cwx.googlegroups.com>
	 <15jxp8z1iu5fk.1oeihvavjghgg$.dlg@40tude.net>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Organization: #
Message-Id: <1167327306.22163.66.camel@localhost>
Mime-Version: 1.0
X-Mailer: Evolution 2.6.1 
Date: Thu, 28 Dec 2006 18:35:06 +0100
NNTP-Posting-Date: 28 Dec 2006 18:35:04 CET
NNTP-Posting-Host: a5f54e97.newsspool1.arcor-online.net
X-Trace: 
 DXC=Q_dSF\OISC1^cW`WBF>WQ<ic==]BZ:af>4Fo<]lROoR1^YC2XCjHcb99kd9h4\W5Z1N[W<lHX`=B4\6BhD5G6\i<3o>aBh2^>Q=
X-Complaints-To: usenet-abuse@arcor.de
Xref: g2news2.google.com comp.lang.ada:8033
Date: 2006-12-28T18:35:04+01:00
List-Id: <comp.lang.ada>

On Thu, 2006-12-28 at 11:59 +0100, Dmitry A. Kazakov wrote:

> > so it cannot ignore
> > Unicode issues if it wishes to allow Unicode characters in identifiers.
>=20
> Which is a BAD idea, IMO.
>=20
> We cannot know anything about properties of letters in Klingon. As a
> practical example consider Russian where e can be used (and is) in place =
of
> =D1=91 see (http://en.wikipedia.org/wiki/%D0%81), but not reverse. Or, ma=
ybe we
> should make Ada compilers capable to detect program written by Germans to
> consider =C3=BC and ue same?

Yes, writing source code is a question of being practical, which is
probably not easily formalized... An international character set
for portable programs seems to leave only some choices open when
they should be practical, does it not?
Naturally, mathematical fancies like being complete,
free of contradictions, etc. are out of the question when it
comes to writing for both humans and computers. What's the point
of having a high level language when you are only allowed identifiers
that the most simplistic mechanical interpreter can "understand"?

Why is it that programmers become somewhat irrational and
impractical when it comes to character sets? They do try to devise
all kinds of pattern recognition algorithms, tricky transformations,
get the best out of fuzzy measurement procedures, and so on.
But not so with character sets. No no, every school child knows
that characters must be such and such ... (maybe the early exposition
to characters is to be held accountable here, everyone is an expert :-)

Anyway, do we have some data that we could discuss that would
explain the practical importance of Unicode/casing issues?
Or, do we have programmers who are well versed in
using a keyboard connected to a computer and still can't
write a program that can tell apple characters from orange
characters?

GNAT already supports the detection of identifiers that were=20
spelled similarly. In case of errors, it lists their "relatives".
Surely a helpful feature, and a proof that practical handling of
natural language identifiers is possible.
As an example, as you have been referring to German, consider that
sharp s, '=C3=9F', is usually written "SS" when capitalized.
So "Stra=C3=9Fe" tends to become "STRASSE". Now if you have a composite
word that has
- a '=C3=9F', and
- an 's' right after it,
such as "Ma=C3=9Fstab" (=3D scale, rule, yardstick), then from a simple
minded formalist's perspective I could argue:

  "Using Unicode is nonsense because there is no 1:1 mapping for the
  German word 'Ma=C3=9Fstab' which will become 'MASSSTAB'. "SSS" is
  ambiguous, it could be "s=C3=9F" or it could be "=C3=9Fs". That's too big
  a challenge for a compiler write. So leave me alone with your
  Unicode and case insensitivity."

Is that what computer science has to answer when asked about
characters handling?

Challenge: Try to find a significant number of German words that
have an 's' before a '=C3=9F'. What's the consequence of your findings?
Even if there are ambiguities in other languages, ambiguities are
not new to Ada (and C++, IIRC), and they have been addressed.

(It seems that the introduction of Unicode to Scheme 6 has recently
made Lisp case sensitive based on arguments such as the one above.
To me this shows "practicality" on the part of the language designer,
vulgo just compiler writer's laziness.)


If the programmers' representatives (the ARG for example) agree that
it is practical to exclude some casing rules or "representation rules",
such as "ue" <-> '=C3=BC', I'm perfectly happy. Because the rule *is*
practical, it helps work, and to hell with mathematical fancies and game
theoretic character shuffling possibilities, when they do not
really matter.

>  What about parsing the source right to left, or top to bottom?

The writing direction problem is solved. Similarly, it seems possible
and practical to connect big endian and little endian computers,
and have them cooperate using algorithms. Both exist, as do apples
oranges, bananas, and pineapples. We can make nice fruit salads.