From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: a07f3367d7,8ea33c39efc56ac3
X-Google-Attributes: gida07f3367d7,public,usenet
X-Google-NewGroupId: yes
X-Google-Language: ENGLISH,UTF8
Path: 
 g2news1.google.com!news1.google.com!news.glorb.com!aioe.org!.POSTED!not-for-mail
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Newsgroups: comp.lang.ada
Subject: Re: sharp =?iso-8859-1?Q?=DF?= and ss in Ada keywords like AC CESS
Date: Thu, 13 Oct 2011 10:10:01 +0200
Organization: cbb software GmbH
Message-ID: <1tgwf2ey7q1qz.hpcw6dmx2aj2$.dlg@40tude.net>
References: <jzbw65n7sj1o.1c75ryih8kppi$.dlg@40tude.net>
 <665628584340145751.161513rm-host.bauhaus-maps.arcor.de@news.arcor.de>
Reply-To: mailbox@dmitry-kazakov.de
NNTP-Posting-Host: FbOMkhMtVLVmu7IwBnt1tw.user.speranza.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
X-Complaints-To: abuse@aioe.org
User-Agent: 40tude_Dialog/2.0.15.1
X-Notice: Filtered by postfilter v. 0.8.2
Xref: g2news1.google.com comp.lang.ada:21408
Date: 2011-10-13T10:10:01+02:00
List-Id: <comp.lang.ada>

On 12 Oct 2011 22:56:38 GMT, Georg Bauhaus wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote:
> 
>>> "acceβ" will be an error, because it mixes two "alphabets",
>>> Latin and Greek.
>> 
>> ß has nothing to do with Greek alphabet, it is a ligature promoted to a
>> separate character. 
> 
> "acceβ" has a Greek BETA, and the compiler would have noticed.

Why should it be? How is "acceβ" worse than "acceß"? 

>>    type Acceß_Type is access Integer;
>>    type Access_Type is access String;
> 
> I'd prefer them to be the same in this particular 
> case, since the Swiss model (which is without ß)
> is working.

What is the reason for them to be same?

>> Do these identifiers conflict?
>> 
>>    I : Integer; -- Latin I
>>    І : Integer; -- Ukrainian I
> 
> Different alphabets, different identifiers.

How do you know in which alphabet is "Mass"? Why should it conflict with
"Maß" for some French programmer?

Rules should *reasonable*. There cannot be any reasonable rule why ß=ss,
but I /= І.

>> Is ö Latin? Are k, u, w Latin? BTW, Latin script was all upper case.
> 
> Yes, they are to be classified as Latin, because
> programmers are used to it, and the relevant standards
> apply, too.  W is double-v or double-u if you insist that
> history should play a part, etc. but this only complicates
> the matter for programming, without need, IMHO.

If "Latin" does not mean Latin, then you need yet another nonsensical rule
to redefine it.

>>> - Cyrillic characters
>> 
>> "Cyrillic characters" is a wild mixture of various characters and
>> ligatures of (like German ß) from different national Cyrillic alphabets,
>> with borrowing from Greek, Latin and later inventions. There is no reason
>> to treat combinations of those as something cohesive.
> 
> I think it is reasonable to define useful, simple sets of the
> characters that people will consider related:

Who are these people? How would you do that and why should Ada language
care?

BTW 1, show me a natural language alphabet in which "_" is a letter?

BTW 2, "'" is a letter in some Russian texts, it is used as a letter (a
part of written word) in German, English and, I presume, in many other
languages I don't know. Nevertheless, it is not a letter according to
Unicode and not a letter in Ada.

> That is, write using sets of characters
> that people will consider related, in this practical sense,
> when using Slavic languages.

Except that half of those languages use no Cyrillic letters at all (e.g.
Polish).

>>> But this should be fairly easy
>>> to implement,
>> 
>> It is not about implementation, it is about understanding the rules without
>> looking into the categorization tables.
> 
> If a word looks like a mix of Cyrillic characters,

You cannot see characters, you do glyphs. Glyphs used in European languages
are massively shared because all alphabets used there stem from one root
and used to influence each other throughout all their history. You cannot
safely recognize alphabet looking at a single word.  

> A programmer
> seeing Cyrillic characters will, on average, be
> right in assuming that he is seeing some
> identifier written in some Slavic language.

Program legality based on statistic analysis? That must be a lot of fun!

>> BTW, why "ΔT" should be illegal?
> 
> Yes, illegal if the alphabet rules apply.

*Why* should they apply? You should give some basic principles for your
rules, language independent ones. E.g. readability, simplicity of use etc.
How does "I is not I" improve readability?

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de