From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: a07f3367d7,8ea33c39efc56ac3
X-Google-Attributes: gida07f3367d7,public,usenet
X-Google-NewGroupId: yes
X-Google-Language: ENGLISH,UTF8
Path: 
 g2news1.google.com!news4.google.com!feeder.news-service.com!feeds.phibee-telecom.net!news.osn.de!diablo2.news.osn.de!proxad.net!feeder2-2.proxad.net!newsfeed.arcor.de!newsspool4.arcor-online.net!news.arcor.de.POSTED!not-for-mail
Date: Thu, 13 Oct 2011 14:13:36 +0200
From: Georg Bauhaus <rm.dash-bauhaus@futureapps.de>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6;
 rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: Re: sharp =?UTF-8?B?w58gYW5kIHNzIGluIEFkYSBrZXl3b3JkcyBsaWtlIEFD?=
 =?UTF-8?B?IENFU1M=?=
References: <jzbw65n7sj1o.1c75ryih8kppi$.dlg@40tude.net>
 <665628584340145751.161513rm-host.bauhaus-maps.arcor.de@news.arcor.de>
 <1tgwf2ey7q1qz.hpcw6dmx2aj2$.dlg@40tude.net>
In-Reply-To: <1tgwf2ey7q1qz.hpcw6dmx2aj2$.dlg@40tude.net>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Message-ID: <4e96d5f0$0$6541$9b4e6d93@newsspool4.arcor-online.net>
Organization: Arcor
NNTP-Posting-Date: 13 Oct 2011 14:13:36 CEST
NNTP-Posting-Host: 84b537b3.newsspool4.arcor-online.net
X-Trace: 
 DXC=9\0efjFU<o:Tia]Ho99G504IUK<Cl32<14Fo<]lROoR18kF<OcfhCO;3<HKakTf;68nc\616M64>:Lh>_cHTX3j=[NBnY[@eSZ5
X-Complaints-To: usenet-abuse@arcor.de
Xref: g2news1.google.com comp.lang.ada:21410
Date: 2011-10-13T14:13:36+02:00
List-Id: <comp.lang.ada>

On 13.10.11 10:10, Dmitry A. Kazakov wrote:

> How is "acceβ" worse than "acceß"? 

The first identifier mixes two different "alphabets";
The second identifier uses variant spelling.
The second identifier can be changed without debate.
This is how "acceβ" worse than "acceß".

The first identifier, mixing two "alphabets", is
like a literal mixing Roman numerals and Arabic numerals
to form a single numeric literal:

   Some_Year : constant := MCD92;

Yeah, that's fun. And there are no Chinese digits
in it, so Europeans and Americans will likely recognize the
intent. But otherwise? What's the point when mixing two
numeric alphabets?


>>>    type Acceß_Type is access Integer;
>>>    type Access_Type is access String;
>>
>> I'd prefer them to be the same in this particular 
>> case, since the Swiss model (which is without ß)
>> is working.
> 
> What is the reason for them to be same?

"the Swiss model (which is without ß) is working."
It is more than technical work, see below


> How do you know in which alphabet is "Mass"? Why should it conflict with
> "Maß" for some French programmer?

The identifiers shouldn't be in conflict, but Ada
makes them be in conflict, Randy has stated one reason.

I can think of two reasons for ss = ß but I /= І:

1) I /= І, since they are from "alphabets" that real people
   think are different. To make the example less artificial,
   Let Αδα /= Ada.  Put your projects at risk and hire
   programmers who would write Aδα (A["03B4"]["03B1"]).
   The compiler will help you finding them out.

2) ss = ß because real people think and act as though
   they are the same, and, importantly, more so (note
   the non-binary, comparative phrase) WRT ss = ß than
   WRT ä = ae, since absence of ä is considered a computer
   thingy, but equivalence of ss and ß is well established
   with or without computer. It is a different issue.


Engineers have projected the situation onto the technical
axis and wisely introduced a capital ß.  This will not
resolve (2) above. The programming situation is placed
somewhere along several axes, not just a technical one.
I therefore speculate that capital ß will add another
power of two to the complexity of the issue in real life.
(Or, finally, make us learn how to use Unicode properly.)

"Mass" has four Latin characters, "Maß" has three,
if you ask any child; they are both written using
Latin characters if you ask any programmer, not Greek
not Hebrew, not Cyrillic, or anything, just Latin.


> If "Latin" does not mean Latin, then you need yet another nonsensical rule
> to redefine it.

"Latin" is here meant to refer to the general thing.
"Latin characters used in Europe" is pretty clear,
and no one will sue you if you include ä or ł.

Declaring simple unions of sections from Unicode is easy,
and consistent.  You just ask some programmers who know
both characters sets and who have been programming for some
time.  The intent is not to bend existing logic, the
intent is to have rules that prevent too much fun with
identifiers (note the comparative).

Glyphs don't help at all, either when constructing an issue
or when resolving an issue. For example, in my terminal window,
the second І looks like |, it does not have serifs. I does.


> Who are these people? 

The people who influence standards.


> Except that half of those languages use no Cyrillic letters at all (e.g.
> Polish).

Poles will still consider sets of Cyrillic characters
to be related, I should think.  Possibly more so than people
West of Poland if Poles more frequently know another Slavic
language.

>> If a word looks like a mix of Cyrillic characters,
> 
> You cannot see characters, you do glyphs.

That's techno-speak again, but programmers see characters
if you ask them. Techno-think is not alone in establishing
writing habits.

> You cannot
> safely recognize alphabet looking at a single word. 

I am looking at programs, not at single words.

> 
>> A programmer
>> seeing Cyrillic characters will, on average, be
>> right in assuming that he is seeing some
>> identifier written in some Slavic language.
> 
> Program legality based on statistic analysis? That must be a lot of fun!

We employ tons of statistics when reading text.
Reading programs is also best when it is fun. We also employ
averaging when writing programs (which pattern has worked best,
what is a good name for this thing, what has worked in the past,
etc.)  Hoare has suggested the use of a macro named
PRELIMINARY_ASSUMPTION, IIRC, so "assuming" seems a normal
attitude when programming, in general.

__
[*] I am referring to the German writers making a scene vis-à-vis
the spelling reform and talking seriously about the destruction of
culture.