sharp ß and ss in Ada keywords like ACCESS

comp.lang.ada
 help / color / mirror / Atom feed

* sharp ß and ss in Ada keywords like ACCESS
@ 2011-10-10 16:30 Georg Bauhaus
  2011-10-10 16:46 ` Adam Beneschan
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Georg Bauhaus @ 2011-10-10 16:30 UTC (permalink / raw)


The history of the USA harbors an interesting specimen of sharp-s,
its origin and meaning, as I learned just now.
It is not from a text written in German, though; rather, it is heading,
TA DA, the Bill of Rights:

http://www.archives.gov/exhibits/charters/bill_of_rights_zoom_1.html

Just in case you wonder what sharp-s is in German, a few decades ago
German writers would have written Kongreï¿½ where they now write Kongress.
look at Congress in the text above to see why. ;-)

The following small article, from a 1949 German news paper,
features president Truman, nuclear Bombs, and said word on line 15:

http://pdfarchiv.zeit.de/1949/39/atom-wettrennen.pdf


  type P is acceï¿½ T;
   -- pre-reform spelling :-)



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like ACCESS
  2011-10-10 16:30 sharp ß and ss in Ada keywords like ACCESS Georg Bauhaus
@ 2011-10-10 16:46 ` Adam Beneschan
  2011-10-10 18:23   ` Georg Bauhaus
                     ` (2 more replies)
  2011-10-10 17:22 ` sharp ß and ss in Ada keywords like ACCESS Simon Wright
  2011-10-10 17:45 ` AdaMagica
  2 siblings, 3 replies; 32+ messages in thread
From: Adam Beneschan @ 2011-10-10 16:46 UTC (permalink / raw)

On Oct 10, 9:30 am, Georg Bauhaus <rm.dash-bauh...@futureapps.de>
wrote:
> The history of the USA harbors an interesting specimen of sharp-s,
> its origin and meaning, as I learned just now.
> It is not from a text written in German, though; rather, it is heading,
> TA DA, the Bill of Rights:
>
> http://www.archives.gov/exhibits/charters/bill_of_rights_zoom_1.html

It's clearly two separate letters there.  And my understanding was
that the origins of the "sharp s" was that it was a combination of s
and z (not of the longer and smaller forms of "s" that we see in the
Bill of Rights).  When I took German in high school about 35 years
ago, the character was called "ess-zed", suggesting that origin.  I
think that's what my father (whose first language was German) called
it too.  I never heard the term "sharp s" until the issue arose in ARG
discussions.

Not sure what point I'm trying to make here.  And I certainly don't
mean to suggest that I know more about your native language than you
do.

                 -- Adam

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like ACCESS
  2011-10-10 16:30 sharp ß and ss in Ada keywords like ACCESS Georg Bauhaus
  2011-10-10 16:46 ` Adam Beneschan
@ 2011-10-10 17:22 ` Simon Wright
  2011-10-10 17:45 ` AdaMagica
  2 siblings, 0 replies; 32+ messages in thread
From: Simon Wright @ 2011-10-10 17:22 UTC (permalink / raw)


Georg Bauhaus <rm.dash-bauhaus@futureapps.de> writes:

> The history of the USA harbors an interesting specimen of sharp-s,
> its origin and meaning, as I learned just now.
> It is not from a text written in German, though; rather, it is heading,
> TA DA, the Bill of Rights:
>
> http://www.archives.gov/exhibits/charters/bill_of_rights_zoom_1.html

I'd have called that 'long s': http://en.wikipedia.org/wiki/Long_s
(which mentions "the German double s, ß" as an example of a ligature).



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like ACCESS
  2011-10-10 16:30 sharp ß and ss in Ada keywords like ACCESS Georg Bauhaus
  2011-10-10 16:46 ` Adam Beneschan
  2011-10-10 17:22 ` sharp ß and ss in Ada keywords like ACCESS Simon Wright
@ 2011-10-10 17:45 ` AdaMagica
  2 siblings, 0 replies; 32+ messages in thread
From: AdaMagica @ 2011-10-10 17:45 UTC (permalink / raw)


That's not a sharp s, ess-zet. It's simply the long s followed by the
round s, as was used some time ago in English.

Imagine the sentence "Where the bee sucks, there suck I" written with
long s (this to our modern eyes is easily mistaken as an f).

The round s was only used at word ends. Like in german Fraktur (and in
Sütterlin), where you have a long and a round s, long at the beginning
and in the middle, round s at the end of a word.

Today you see many a "Gasthaus" written in Fraktur with the wrong s,
the round one, where they should have used the long one.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like ACCESS
  2011-10-10 16:46 ` Adam Beneschan
@ 2011-10-10 18:23   ` Georg Bauhaus
  2011-10-10 22:25     ` sharp � " Randy Brukardt
  2011-10-11 17:33     ` sharp ß " Martin Krischik
  2011-10-11  7:33   ` sharp ß and ss in Ada keywords like ACCESS Yannick Duchêne (Hibou57)
  2011-10-11 17:26   ` sharp ß and ss in Ada keywords like ACCESS (better not) Martin Krischik
  2 siblings, 2 replies; 32+ messages in thread
From: Georg Bauhaus @ 2011-10-10 18:23 UTC (permalink / raw)

On 10.10.11 18:46, Adam Beneschan wrote:
> On Oct 10, 9:30 am, Georg Bauhaus <rm.dash-bauh...@futureapps.de>
> wrote:
>> The history of the USA harbors an interesting specimen of sharp-s,
>> its origin and meaning, as I learned just now.
>> It is not from a text written in German, though; rather, it is heading,
>> TA DA, the Bill of Rights:
>>
>> http://www.archives.gov/exhibits/charters/bill_of_rights_zoom_1.html
> 
> It's clearly two separate letters there.  And my understanding was
> that the origins of the "sharp s" was that it was a combination of s
> and z (not of the longer and smaller forms of "s" that we see in the
> Bill of Rights).  When I took German in high school about 35 years
> ago, the character was called "ess-zed", suggesting that origin.  I
> think that's what my father (whose first language was German) called
> it too.  I never heard the term "sharp s" until the issue arose in ARG
> discussions.

To the best of my knowledge, the issue is not settled, and
likely will never be, because it stems from the early days
of writing at all. Several (seueral :-) dialects and pronunciations,
together with several conventions of how early writers represent
speech yield a matrix of possibilities. In addition, experts
distinguish ligature from abbreviation, which at first sight is
taken to be in favor of s+z, but see below for "z form" of s.
Many books from around 1900 do not have ß, but long-s and short-s;
this convention did not last long, though.

However, there is substantial evidence that no z would ever be
combined with an s such that the result is both forming an
"ess-zed" shape, and also meaning s+z, in a word that stems from Latin,
such as "process" ("Prozeß", now "Prozess"). Those words would
only be rendered using 26 characters of pure Latin type anyway,
without ß, and never turning the "ss" from Lating into "sz".
That is, finding "Prozesz" in print or writing anywhere
if highly unlikely.

Rather, some sources suggest a "z form" (shape) after long-s as a writing
convention that assigns to z the meaning of terminal-s; so z becomes
overloaded with s, and thus Congreſs was rendered Kongreß until
recently; it was always rendered KONGRESS in capital letters---KONGRESZ
can safely be ignored as an oddity, even though this exceptional
spelling if formally allowed (by the powers that be).
The "z form" would be in harmony with the combination of long-s short-s
in old style handwriting in German and some Scandinavian languages.
Whatever the origins of ß might be, *every* single rule that kids get
taught at school was, and is, about the ralation of ß and ss,
about sounds, and when to use them; "ess-zed" becomes just a name,
not itself implying "z", but reflecting the (former) looks of it. Speakers
of German wouldn't say [koŋgrests] just because "ts" is how z sounds
in German.

I'm interested in the subject because it is one of those undecidable
problems that, nevertheless, produce heated debates over language rules;
Scheme R6RS introduces case sensitivity in part because there is no
1:1 rule for turning "sss" or "ßs" into capital letters...
Which introduces more opportunities for deliberate obfuscation
(define (select MASSE MASZE MAßE) ...)

I'd think that the simplest of the rules is to make "ss" and "ß"
the same, and ask programmers to relax. This has worked in Switzerland
for many, many years.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp � and ss in Ada keywords like ACCESS
  2011-10-10 18:23   ` Georg Bauhaus
@ 2011-10-10 22:25     ` Randy Brukardt
  2011-10-11  7:36       ` Dmitry A. Kazakov
  2011-10-11 17:33     ` sharp ß " Martin Krischik
  1 sibling, 1 reply; 32+ messages in thread
From: Randy Brukardt @ 2011-10-10 22:25 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1199 bytes --]

"Georg Bauhaus" <rm.dash-bauhaus@futureapps.de> wrote in message 
news:4e93381d$0$6545$9b4e6d93@newsspool4.arcor-online.net...
...
> I'd think that the simplest of the rules is to make "ss" and "�"
> the same, and ask programmers to relax. This has worked in Switzerland
> for many, many years.

That's what Unicode does. However, that would be incompatible and 
inconsistent (that is, both compile-time and run-time incompatible) with Ada 
95 identifiers. That would not be acceptable.

Thus, Ada 2012 uses a simpler rule (Ada 2005 just totally screwed this up, 
and it has to be ignored). Thus "acce�" /= "access".

Also note that even if the conversion was allowed, the identifier "acce�" 
would be illegal: 2.3(5.3/3) [and the equivalent rule in Ada 2005] makes it 
illegal to have an identifier that is identical to a reserved word. And 
reserved words use character-by-character case conversion - any case where 
the number of characters change is not considered. Since Ada 2012 uses 
"simple case folding", 2.3(5.3/3) doesn't have any impact, but if it had 
used "full case folding", it would prevent words like the above.

                                                  Randy.






^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like ACCESS
  2011-10-10 16:46 ` Adam Beneschan
  2011-10-10 18:23   ` Georg Bauhaus
@ 2011-10-11  7:33   ` Yannick Duchêne (Hibou57)
  2011-10-11 14:32     ` Adam Beneschan
  2011-10-11 17:26   ` sharp ß and ss in Ada keywords like ACCESS (better not) Martin Krischik
  2 siblings, 1 reply; 32+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-10-11  7:33 UTC (permalink / raw)


Le Mon, 10 Oct 2011 18:46:42 +0200, Adam Beneschan <adam@irvine.com> a  
écrit:

> When I took German in high school about 35 years
> ago, the character was called "ess-zed"
When I took German at school about 20 years ago, the teacher was saying  
ess-set.

Sorry for digressing.

-- 
“Syntactic sugar causes cancer of the semi-colons.”  [Epigrams on  
Programming — Alan J. — P. Yale University]
“Structured Programming supports the law of the excluded muddle.” [Idem]
Java: Write once, Never revisit



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp � and ss in Ada keywords like ACCESS
  2011-10-10 22:25     ` sharp � " Randy Brukardt
@ 2011-10-11  7:36       ` Dmitry A. Kazakov
  2011-10-11  7:41         ` sharp ß " Yannick Duchêne (Hibou57)
  0 siblings, 1 reply; 32+ messages in thread
From: Dmitry A. Kazakov @ 2011-10-11  7:36 UTC (permalink / raw)


On Mon, 10 Oct 2011 17:25:17 -0500, Randy Brukardt wrote:

> Also note that even if the conversion was allowed, the identifier "acceï¿½" 
> would be illegal: 2.3(5.3/3) [and the equivalent rule in Ada 2005] makes it 
> illegal to have an identifier that is identical to a reserved word. And 
> reserved words use character-by-character case conversion - any case where 
> the number of characters change is not considered. Since Ada 2012 uses 
> "simple case folding", 2.3(5.3/3) doesn't have any impact, but if it had 
> used "full case folding", it would prevent words like the above.

One still can fool it by using Cyrillic a (0430), c (0441), e (0435).

P.S. Shouldn't we ask the Unicode consortium to introduce programming
languages page with "access" as single character (and "goto", "if", "then",
"begin", "end", ... etc). They surely would enjoy it, since they did just
so for Roman numerals, degree symbols (ï¿½C, ï¿½F) etc. That would make the
mess perfect! (:-))

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like ACCESS
  2011-10-11  7:36       ` Dmitry A. Kazakov
@ 2011-10-11  7:41         ` Yannick Duchêne (Hibou57)
  2011-10-11  8:33           ` Dmitry A. Kazakov
  0 siblings, 1 reply; 32+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2011-10-11  7:41 UTC (permalink / raw)


Le Tue, 11 Oct 2011 09:36:31 +0200, Dmitry A. Kazakov  
<mailbox@dmitry-kazakov.de> a écrit:
> One still can fool it by using Cyrillic a (0430), c (0441), e (0435).

But Cyrillic letters does not look the same:  
http://en.wikipedia.org/wiki/Cyrillic#Letters

What do you mean ?

-- 
“Syntactic sugar causes cancer of the semi-colons.”  [Epigrams on  
Programming — Alan J. — P. Yale University]
“Structured Programming supports the law of the excluded muddle.” [Idem]
Java: Write once, Never revisit



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like ACCESS
  2011-10-11  7:41         ` sharp ß " Yannick Duchêne (Hibou57)
@ 2011-10-11  8:33           ` Dmitry A. Kazakov
  2011-10-11 20:32             ` sharp � " Randy Brukardt
  0 siblings, 1 reply; 32+ messages in thread
From: Dmitry A. Kazakov @ 2011-10-11  8:33 UTC (permalink / raw)


On Tue, 11 Oct 2011 09:41:44 +0200, Yannick Duchêne (Hibou57) wrote:

> Le Tue, 11 Oct 2011 09:36:31 +0200, Dmitry A. Kazakov  
> <mailbox@dmitry-kazakov.de> a écrit:
>> One still can fool it by using Cyrillic a (0430), c (0441), e (0435).
> 
> But Cyrillic letters does not look the same:  
> http://en.wikipedia.org/wiki/Cyrillic#Letters
> 
> What do you mean ?

ассеss  (typed in Cyrillic, except for "ss", if you have an Unicode reader)

Some of Cyrillic letters look same. Many are just same. There are same
letters with different phonetic meaning than their Latin counterparts. Here
is an incomplete list of same looking letters. The left side is Latin, the
right side is Cyrillic.

a = а
y = у (but means "u")
k = к (has shorter vertical dash, capital letters are identical)
e = е
h ~ н (capital letters are identical, the meaning is "n")
x = х (but means "h")
b ~ в (capital letters are identical, the meaning is "v")
p = р (the meaning is "r")
o = о
c = с (the meaning is "s")
m = м (capital letters are identical)
t = т (capital letters are identical)

Earlier 7-bit Cyrillic encodings used phonetic equivalence of letters. For
example in KOI-7 (КОИ-7) the codes both A's differs in one bit.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like ACCESS
  2011-10-11  7:33   ` sharp ß and ss in Ada keywords like ACCESS Yannick Duchêne (Hibou57)
@ 2011-10-11 14:32     ` Adam Beneschan
  0 siblings, 0 replies; 32+ messages in thread
From: Adam Beneschan @ 2011-10-11 14:32 UTC (permalink / raw)


On Oct 11, 12:33 am, Yannick Duchêne (Hibou57)
<yannick_duch...@yahoo.fr> wrote:
> Le Mon, 10 Oct 2011 18:46:42 +0200, Adam Beneschan <a...@irvine.com> a  
> écrit:
>
> > When I took German in high school about 35 years
> > ago, the character was called "ess-zed"
>
> When I took German at school about 20 years ago, the teacher was saying  
> ess-set.

I intended for "zed" to be pronounced as it would be in German, i.e.
"tset".  So what I spelled as "ess-zed" would be pronounced "ess-
tset".

                         -- Adam




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like ACCESS (better not)
  2011-10-10 16:46 ` Adam Beneschan
  2011-10-10 18:23   ` Georg Bauhaus
  2011-10-11  7:33   ` sharp ß and ss in Ada keywords like ACCESS Yannick Duchêne (Hibou57)
@ 2011-10-11 17:26   ` Martin Krischik
  2011-10-12 12:34     ` Georg Bauhaus
  2 siblings, 1 reply; 32+ messages in thread
From: Martin Krischik @ 2011-10-11 17:26 UTC (permalink / raw)


Am 10.10.2011, 18:46 Uhr, schrieb Adam Beneschan <adam@irvine.com>:

> And my understanding was
> that the origins of the "sharp s" was that it was a combination of s
> and z

Indeed it is the s and z written close together of Fraktur calligraphic:

http://en.wikipedia.org/wiki/Fraktur

These days Fraktur is only used by mathematicians for vector and matrix  
variables. With the currently used typefaces the replacement characters  
for ß are ss. Note that all german special characters have replacements  
(ä=ae, ö=oe, ü=ue and ß=ss). But i don't think these replacements should  
be part of the Ada language.

Regards

Martin
-- 
Martin Krischik
mailto://krischik@users.sourceforge.net
https://sourceforge.net/users/krischik



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like ACCESS
  2011-10-10 18:23   ` Georg Bauhaus
  2011-10-10 22:25     ` sharp � " Randy Brukardt
@ 2011-10-11 17:33     ` Martin Krischik
  2011-10-11 18:54       ` Adam Beneschan
  2011-10-12 13:03       ` Georg Bauhaus
  1 sibling, 2 replies; 32+ messages in thread
From: Martin Krischik @ 2011-10-11 17:33 UTC (permalink / raw)


Am 10.10.2011, 20:23 Uhr, schrieb Georg Bauhaus  
<rm.dash-bauhaus@futureapps.de>:

> However, there is substantial evidence that no z would ever be
> combined with an s such that the result is both forming an
> "ess-zed" shape,

You just have to use the right typeface:

http://de.wikipedia.org/wiki/Datei:Fraktur_letter_S.png
http://de.wikipedia.org/wiki/Datei:Fraktur_letter_Z.png

The middle s and then small z will form the ß

Regards

Martin
-- 
Martin Krischik
mailto://krischik@users.sourceforge.net
https://sourceforge.net/users/krischik



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like ACCESS
  2011-10-11 17:33     ` sharp ß " Martin Krischik
@ 2011-10-11 18:54       ` Adam Beneschan
  2011-10-12 13:03       ` Georg Bauhaus
  1 sibling, 0 replies; 32+ messages in thread
From: Adam Beneschan @ 2011-10-11 18:54 UTC (permalink / raw)

On Oct 11, 10:33 am, "Martin Krischik"
<krisc...@users.sourceforge.net> wrote:
> Am 10.10.2011, 20:23 Uhr, schrieb Georg Bauhaus  
> <rm.dash-bauh...@futureapps.de>:
>
> > However, there is substantial evidence that no z would ever be
> > combined with an s such that the result is both forming an
> > "ess-zed" shape,
>
> You just have to use the right typeface:
>
> http://de.wikipedia.org/wiki/Datei:Fraktur_letter_S.png
> http://de.wikipedia.org/wiki/Datei:Fraktur_letter_Z.png
>
> The middle s and then small z will form the ß

Actually, I think that a handwritten "z" looks quite a bit like the
right side of ß (and the
elongated "s" looks like the left side, so there).  When they tried to
teach me penmanship
grade school (this is here in California), the "z" actually looked
kind of like a 3 with a
loop at the bottom, not unlike the "z" Martin linked to.  I don't know
if they still teach
that in schools.  I know my handwritten z's these days don't look
anything like what
they tried to foist on me in second grade (and neither do my s's, r's,
f's, and probably
several other letters).  However, I still use a mnemonic my dad taught
me to tell whether
a moon is waxing or waning; if the curve is the same as the curve in a
handwritten A, it's
waning (A=abnehmen), while if it's the same as the handwritten Z, the
one that sort of looks
like a 3, it's waxing (Z=zunehmen).

Just to get us even further off topic.  If we want to get back to Ada,
we should start
discussing whether the language should treat the characters "z" and
"3" as equivalent.
(Just kidding... :-) :-))

                         -- Adam

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp � and ss in Ada keywords like ACCESS
  2011-10-11  8:33           ` Dmitry A. Kazakov
@ 2011-10-11 20:32             ` Randy Brukardt
  2011-10-12  7:43               ` Dmitry A. Kazakov
  0 siblings, 1 reply; 32+ messages in thread
From: Randy Brukardt @ 2011-10-11 20:32 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1671 bytes --]

I believe (but haven't checked carefully), that Unicode case folding never 
treats Cyrillic and Latin characters as the same (even when it could). So 
this problem would not come up in Ada.

                    Randy.

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message 
news:1hvfz4aex1o6a$.1hibv8s1wdzy9.dlg@40tude.net...
> On Tue, 11 Oct 2011 09:41:44 +0200, Yannick Duch�ne (Hibou57) wrote:
>
>> Le Tue, 11 Oct 2011 09:36:31 +0200, Dmitry A. Kazakov
>> <mailbox@dmitry-kazakov.de> a �crit:
>>> One still can fool it by using Cyrillic a (0430), c (0441), e (0435).
>>
>> But Cyrillic letters does not look the same:
>> http://en.wikipedia.org/wiki/Cyrillic#Letters
>>
>> What do you mean ?
>
> ????ss  (typed in Cyrillic, except for "ss", if you have an Unicode 
> reader)
>
> Some of Cyrillic letters look same. Many are just same. There are same
> letters with different phonetic meaning than their Latin counterparts. 
> Here
> is an incomplete list of same looking letters. The left side is Latin, the
> right side is Cyrillic.
>
> a = ?
> y = ? (but means "u")
> k = ? (has shorter vertical dash, capital letters are identical)
> e = ?
> h ~ ? (capital letters are identical, the meaning is "n")
> x = ? (but means "h")
> b ~ ? (capital letters are identical, the meaning is "v")
> p = ? (the meaning is "r")
> o = ?
> c = ? (the meaning is "s")
> m = ? (capital letters are identical)
> t = ? (capital letters are identical)
>
> Earlier 7-bit Cyrillic encodings used phonetic equivalence of letters. For
> example in KOI-7 (???-7) the codes both A's differs in one bit.
>
> -- 
> Regards,
> Dmitry A. Kazakov
> http://www.dmitry-kazakov.de 





^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp � and ss in Ada keywords like ACCESS
  2011-10-11 20:32             ` sharp � " Randy Brukardt
@ 2011-10-12  7:43               ` Dmitry A. Kazakov
  2011-10-12  9:42                 ` J-P. Rosen
  2011-10-12 20:17                 ` sharp " Randy Brukardt
  0 siblings, 2 replies; 32+ messages in thread
From: Dmitry A. Kazakov @ 2011-10-12  7:43 UTC (permalink / raw)

On Tue, 11 Oct 2011 15:32:35 -0500, Randy Brukardt wrote:

> I believe (but haven't checked carefully), that Unicode case folding never 
> treats Cyrillic and Latin characters as the same (even when it could).

That is for sure, they are different code points.

> So this problem would not come up in Ada.

Depends on what is considered problematic.

ß is not ss and for that matter, it is not β (beta). Their capital letters,
if exist (ß does not have capital case) are different.

If Ada wished to introduce some rules of equivalence for Central European
languages, like ß=ss, ä=ae, ö=oe, and Eastern European languages, like a=а,
ё=e, and who knows what rules in other languages exist, that would be
hopeless.

But without them, the programmer could have *all* Ada keywords as
identifiers, by replacing appropriate Latin letters with Cyrillic ones.

Furthermore, identifiers looking perfectly same will be different and there
is a huge number of homonyms of practically *each* reasonable identifier. I
see it in breach with basic Ada design.

Since it likely just to stay so, there seems to me absolutely no reason to
keep any reserved keywords in such a language. I would drop them, and
finally be able to declare something "Range".

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp � and ss in Ada keywords like ACCESS
  2011-10-12  7:43               ` Dmitry A. Kazakov
@ 2011-10-12  9:42                 ` J-P. Rosen
  2011-10-12 12:09                   ` Dmitry A. Kazakov
  2011-10-12 20:17                 ` sharp " Randy Brukardt
  1 sibling, 1 reply; 32+ messages in thread
From: J-P. Rosen @ 2011-10-12  9:42 UTC (permalink / raw)


Le 12/10/2011 09:43, Dmitry A. Kazakov a écrit :
> If Ada wished to introduce some rules of equivalence for Central European
> languages, like ß=ss, ä=ae, ö=oe, and Eastern European languages, like a=а,
> ё=e, and who knows what rules in other languages exist, that would be
> hopeless.
> 
> But without them, the programmer could have *all* Ada keywords as
> identifiers, by replacing appropriate Latin letters with Cyrillic ones.
> 
> Furthermore, identifiers looking perfectly same will be different and there
> is a huge number of homonyms of practically *each* reasonable identifier. I
> see it in breach with basic Ada design.
> 
Too unlikely to care about. If you really fear something fishy in a
program, use AdaControl rule:
   check characters (not_ISO_646);

-- 
---------------------------------------------------------
           J-P. Rosen (rosen@adalog.fr)
Adalog a déménagé / Adalog has moved:
2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX
Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp � and ss in Ada keywords like ACCESS
  2011-10-12  9:42                 ` J-P. Rosen
@ 2011-10-12 12:09                   ` Dmitry A. Kazakov
  0 siblings, 0 replies; 32+ messages in thread
From: Dmitry A. Kazakov @ 2011-10-12 12:09 UTC (permalink / raw)


On Wed, 12 Oct 2011 11:42:50 +0200, J-P. Rosen wrote:

> Le 12/10/2011 09:43, Dmitry A. Kazakov a écrit :
>> If Ada wished to introduce some rules of equivalence for Central European
>> languages, like ß=ss, ä=ae, ö=oe, and Eastern European languages, like a=а,
>> ё=e, and who knows what rules in other languages exist, that would be
>> hopeless.
>> 
>> But without them, the programmer could have *all* Ada keywords as
>> identifiers, by replacing appropriate Latin letters with Cyrillic ones.
>> 
>> Furthermore, identifiers looking perfectly same will be different and there
>> is a huge number of homonyms of practically *each* reasonable identifier. I
>> see it in breach with basic Ada design.
>> 
> Too unlikely to care about.

It is not unlikely that somebody would like name something "Intеrface",
which I presume is a valid identifier in Ada 2012.

> If you really fear something fishy in a
> program, use AdaControl rule:
>   check characters (not_ISO_646);

Or use lint for C programs? No tool can be a replacement for a language
problem. Tools,are indicators of language problems.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like ACCESS (better not)
  2011-10-11 17:26   ` sharp ß and ss in Ada keywords like ACCESS (better not) Martin Krischik
@ 2011-10-12 12:34     ` Georg Bauhaus
  0 siblings, 0 replies; 32+ messages in thread
From: Georg Bauhaus @ 2011-10-12 12:34 UTC (permalink / raw)


On 11.10.11 19:26, Martin Krischik wrote:
> Am 10.10.2011, 18:46 Uhr, schrieb Adam Beneschan <adam@irvine.com>:
> 
>> And my understanding was
>> that the origins of the "sharp s" was that it was a combination of s
>> and z
> 
> Indeed it is the s and z written close together of Fraktur calligraphic:
> 

It is not that easy (in theory):

http://www-nw.uni-regensburg.de/~brh22505/Ligatur/LIGATUR.HTM
http://de.wikipedia.org/wiki/ß
http://en.wikipedia.org/wiki/ß

> http://en.wikipedia.org/wiki/Fraktur
> 
> These days Fraktur is only used by mathematicians for vector and matrix
> variables.

http://www.faz.net  .-)




^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like ACCESS
  2011-10-11 17:33     ` sharp ß " Martin Krischik
  2011-10-11 18:54       ` Adam Beneschan
@ 2011-10-12 13:03       ` Georg Bauhaus
  2011-10-12 13:48         ` Dmitry A. Kazakov
  1 sibling, 1 reply; 32+ messages in thread
From: Georg Bauhaus @ 2011-10-12 13:03 UTC (permalink / raw)


On 11.10.11 19:33, Martin Krischik wrote:
> Am 10.10.2011, 20:23 Uhr, schrieb Georg Bauhaus <rm.dash-bauhaus@futureapps.de>:
> 
>> However, there is substantial evidence that no z would ever be
>> combined with an s such that the result is both forming an
>> "ess-zed" shape,
> 
> You just have to use the right typeface:

(The crucial part that you did not quote was

 ``and also meaning s+z, in a word that stems from Latin,
  such as "process"''

If "ss" and "ß" were treated the same in computer language,
then Switzerland will serve as evidence that it is an actual
possibility working even in natural language.)  Programmers'
reaction to other equivalences such as "ae" ~ "ä" might be
less friendly.

But I imagine a language rule that addresses common sense
more than it does the mechanics of Unicode or the history
of writing; it might even be easy to implement:

 Any simple name shall include alphabetic characters from
 only one "alphabet".

Presuming some practical definition of "alphabet".



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like ACCESS
  2011-10-12 13:03       ` Georg Bauhaus
@ 2011-10-12 13:48         ` Dmitry A. Kazakov
  2011-10-12 18:24           ` Georg Bauhaus
  0 siblings, 1 reply; 32+ messages in thread
From: Dmitry A. Kazakov @ 2011-10-12 13:48 UTC (permalink / raw)


On Wed, 12 Oct 2011 15:03:13 +0200, Georg Bauhaus wrote:

> But I imagine a language rule that addresses common sense
> more than it does the mechanics of Unicode or the history
> of writing; it might even be easy to implement:

Speaking of common sense one should simply drop ï¿½ and all other letters not
present in 7-bit ASCII.
 
>  Any simple name shall include alphabetic characters from
>  only one "alphabet".

That does not solve the problem. If ï¿½=ss, then sch=sh, when matching two
simple names of different alphabets. How are you going to tag names?

   German#acceï¿½#  
   US#access#

(:-))

> Presuming some practical definition of "alphabet".

For example?

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like ACCESS
  2011-10-12 13:48         ` Dmitry A. Kazakov
@ 2011-10-12 18:24           ` Georg Bauhaus
  2011-10-12 20:06             ` sharp � " Randy Brukardt
  2011-10-12 20:48             ` sharp ß " Dmitry A. Kazakov
  0 siblings, 2 replies; 32+ messages in thread
From: Georg Bauhaus @ 2011-10-12 18:24 UTC (permalink / raw)

On 12.10.11 15:48, Dmitry A. Kazakov wrote:
> On Wed, 12 Oct 2011 15:03:13 +0200, Georg Bauhaus wrote:
> 
>> But I imagine a language rule that addresses common sense
>> more than it does the mechanics of Unicode or the history
>> of writing; it might even be easy to implement:
> 
> Speaking of common sense one should simply drop ß and all other letters not
> present in 7-bit ASCII.

(Why character case? Let's save bits by dropping small letters. ;-)

> If ß=ss, then sch=sh, when matching two
> simple names of different alphabets. How are you going to tag names?
> 
>    German#acceß#  
>    US#access#
> 
> (:-))

The "alphabet" of both "access" and "acceß" (Horrible!) shall
be "Latin", see below.  Thus "access" is not Greek, and
"acceβ" will be an error, because it mixes two "alphabets",
Latin and Greek. The compiler will detected the syntax error.
The same will be true of "AССESS" or "'Rаnge", both being syntax
errors:

$ echo "AССESS" "'Rаnge" |od -c
0000000    A   С  **   С  **   E   S   S       '   R   а  **   n   g   e

Syntax errors are easily detected. The compiler can report
them very clearly:
E: The word "AССESS" uses characters from more than one alphabet

>> Presuming some practical definition of "alphabet".
> 
> For example?

I'd try a KISS definition of "alphabet". It does not involve
national languages, or meaning.

- Latin characters
- Cyrillic characters
- Greek characters
- Arabic (including Farsi) characters
- Hebrew characters
- Chinese characters (both old style, reformed style)
- Japanese characters; I think the rules might have to be
  a little more picky for Japanese identifiers?
- one of the alphabets used in India where all characters
  must come from a single Unicode group such as Devanagari
  or Gujarati
- Thai, Lao, ... characters
- ...

These groupings operate at some very basic level, they don't
care about the meaning of identifiers.  They ignore national
preferences.  Identifiers may not be in harmony with the
requirements of poetry, then.  But this should be fairly easy
to implement, since it is all about simple sets of characters,
They are not overlapping if one draws on ISO 10646. Hence
unions can be formed, and membership tests are easy.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp � and ss in Ada keywords like ACCESS
  2011-10-12 18:24           ` Georg Bauhaus
@ 2011-10-12 20:06             ` Randy Brukardt
  2011-10-12 20:48             ` sharp ß " Dmitry A. Kazakov
  1 sibling, 0 replies; 32+ messages in thread
From: Randy Brukardt @ 2011-10-12 20:06 UTC (permalink / raw)


"Georg Bauhaus" <rm.dash-bauhaus@futureapps.de> wrote in message 
news:4e95db62$0$6554$9b4e6d93@newsspool4.arcor-online.net...
...
> I'd try a KISS definition of "alphabet". It does not involve
> national languages, or meaning.

If it is not a 10646 character classification, it isn't practical to use in 
Ada (or any other programming language). Attempting to invent our own 
classifications would be a recurring nightmare as new characters are added 
to the character sets.

The Unicode people have spent a lot of effort thinking about the issues of 
extended characters in identifiers, and it makes the most sense to build on 
their work rather than inventing some of our own. After all, they're the 
character experts, not us.

                                       Randy.





^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp  and ss in Ada keywords like ACCESS
  2011-10-12  7:43               ` Dmitry A. Kazakov
  2011-10-12  9:42                 ` J-P. Rosen
@ 2011-10-12 20:17                 ` Randy Brukardt
  2011-10-12 21:18                   ` Dmitry A. Kazakov
  1 sibling, 1 reply; 32+ messages in thread
From: Randy Brukardt @ 2011-10-12 20:17 UTC (permalink / raw)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2970 bytes --]

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message 
news:1jwwodd91xfc8$.1xbuqt6j4xh5$.dlg@40tude.net...
> On Tue, 11 Oct 2011 15:32:35 -0500, Randy Brukardt wrote:
>
>> I believe (but haven't checked carefully), that Unicode case folding 
>> never
>> treats Cyrillic and Latin characters as the same (even when it could).
>
> That is for sure, they are different code points.

That has nothing to do with. I think you don't understand very well how 
identifier equivalence is defined by Unicode (which is borrowed by Ada). 
Unicode defines something called "full case folding" (as well as "simple 
case folding"), it defines which characters are considered equivalent by 
case. Different code points can of course be equivalent (think 'J' and 'j' 
in the Latin characters, and also the similar Cyrillic characters).

>> So this problem would not come up in Ada.
>
> Depends on what is considered problematic.
>
> � is not ss and for that matter, it is not ? (beta). Their capital 
> letters,
> if exist (� does not have capital case) are different.

Actually, "full case folding" says that "�" == "ss" == "SS" etc. (That's why 
we decided that we had to diverge from the Unicode recommendations, as the 
above would be incompatible with Ada 95 programs. Thus Ada 2012 uses "simple 
case folding", which does not map different length sequences as equivalent.)

> If Ada wished to introduce some rules of equivalence for Central European
> languages, like �=ss, �=ae, �=oe, and Eastern European languages, like 
> a=?,
> ?=e, and who knows what rules in other languages exist, that would be
> hopeless.

If Unicode wanted to include such equivalences, Ada would have them too. 
It's not at all hopeless - we're simply going to do what ever Unicode 
recommends. (In this specific case, I believe that Unicode recommends not 
have equivalence.)

> But without them, the programmer could have *all* Ada keywords as
> identifiers, by replacing appropriate Latin letters with Cyrillic ones.

Yup. And so what? It's the price you pay for allowing Cyrillic characters in 
identifiers. And that is a choice completely out of our hands -- it is 
mandated by JTC 1 and SC 22. We couldn't have an Ada Standard at all without 
allowing this.

> Furthermore, identifiers looking perfectly same will be different and 
> there
> is a huge number of homonyms of practically *each* reasonable identifier. 
> I
> see it in breach with basic Ada design.

As mentioned, it is required as a part of all programming language 
standards. Is this a bad idea? Sure, I agree with you on that, but we have 
no choice.

J-P's tool or even a compiler-switch can prevent problematic identifiers. 
We're not allowed to if we want to be an International Standard (emphasis on 
"International"). [Or just use Janus/Ada; it's unlikely that I'll mess with 
the pain of these identifiers unless some customer demands it with $$$.]

                                       Randy.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like ACCESS
  2011-10-12 18:24           ` Georg Bauhaus
  2011-10-12 20:06             ` sharp � " Randy Brukardt
@ 2011-10-12 20:48             ` Dmitry A. Kazakov
  2011-10-12 22:56               ` sharp ß and ss in Ada keywords like AC CESS Georg Bauhaus
  1 sibling, 1 reply; 32+ messages in thread
From: Dmitry A. Kazakov @ 2011-10-12 20:48 UTC (permalink / raw)


On Wed, 12 Oct 2011 20:24:33 +0200, Georg Bauhaus wrote:

> On 12.10.11 15:48, Dmitry A. Kazakov wrote:
>> On Wed, 12 Oct 2011 15:03:13 +0200, Georg Bauhaus wrote:
>> 
>>> But I imagine a language rule that addresses common sense
>>> more than it does the mechanics of Unicode or the history
>>> of writing; it might even be easy to implement:
>> 
>> Speaking of common sense one should simply drop ß and all other letters not
>> present in 7-bit ASCII.
> 
> (Why character case? Let's save bits by dropping small letters. ;-)

This is what Ada 83 did being case agnostic.
 
>> If ß=ss, then sch=sh, when matching two
>> simple names of different alphabets. How are you going to tag names?
>> 
>>    German#acceß#  
>>    US#access#
>> 
>> (:-))
> 
> The "alphabet" of both "access" and "acceß" (Horrible!) shall
> be "Latin", see below.  Thus "access" is not Greek, and
> "acceβ" will be an error, because it mixes two "alphabets",
> Latin and Greek.

ß has nothing to do with Greek alphabet, it is a ligature promoted to a
separate character. 

> The compiler will detected the syntax error.

That was not my question. It was what to do with this:

   type Acceß_Type is access Integer;
   type Access_Type is access String;

Do these identifiers conflict?

   I : Integer; -- Latin I
   І : Integer; -- Ukrainian I

>>> Presuming some practical definition of "alphabet".
>> 
>> For example?
> 
> I'd try a KISS definition of "alphabet". It does not involve
> national languages, or meaning.
> 
> - Latin characters

Is ö Latin? Are k, u, w Latin? BTW, Latin script was all upper case.

> - Cyrillic characters

"Cyrillic characters" is a wild mixture of various characters and
ligatures of (like German ß) from different national Cyrillic alphabets,
with borrowing from Greek, Latin and later inventions. There is no reason
to treat combinations of those as something cohesive.

> But this should be fairly easy
> to implement,

It is not about implementation, it is about understanding the rules without
looking into the categorization tables.

BTW, why "ΔT" should be illegal?

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp  and ss in Ada keywords like ACCESS
  2011-10-12 20:17                 ` sharp " Randy Brukardt
@ 2011-10-12 21:18                   ` Dmitry A. Kazakov
  0 siblings, 0 replies; 32+ messages in thread
From: Dmitry A. Kazakov @ 2011-10-12 21:18 UTC (permalink / raw)


On Wed, 12 Oct 2011 15:17:21 -0500, Randy Brukardt wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message 
> news:1jwwodd91xfc8$.1xbuqt6j4xh5$.dlg@40tude.net...
>> On Tue, 11 Oct 2011 15:32:35 -0500, Randy Brukardt wrote:
>>
>>> I believe (but haven't checked carefully), that Unicode case folding never
>>> treats Cyrillic and Latin characters as the same (even when it could).
>>
>> That is for sure, they are different code points.
> 
> That has nothing to do with. I think you don't understand very well how 
> identifier equivalence is defined by Unicode (which is borrowed by Ada). 
> Unicode defines something called "full case folding" (as well as "simple 
> case folding"), it defines which characters are considered equivalent by 
> case. Different code points can of course be equivalent (think 'J' and 'j' 
> in the Latin characters, and also the similar Cyrillic characters).

I see what you mean. I should have said: different code pages. It seems
that full folding does not cross their border. However maybe they just
didn't yet define proper rules for Cyrillic letters. For example, the
proper full folding for Cyrillic IO (яПН) should be Cyrillic IE (яПН). Who know
what they will do later? There are Cyrillic J and j!

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like AC CESS
  2011-10-12 20:48             ` sharp ß " Dmitry A. Kazakov
@ 2011-10-12 22:56               ` Georg Bauhaus
  2011-10-13  8:10                 ` Dmitry A. Kazakov
  0 siblings, 1 reply; 32+ messages in thread
From: Georg Bauhaus @ 2011-10-12 22:56 UTC (permalink / raw)

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote:

>> "acceβ" will be an error, because it mixes two "alphabets",
>> Latin and Greek.
> 
> ß has nothing to do with Greek alphabet, it is a ligature promoted to a
> separate character. 

"acceβ" has a Greek BETA, and the compiler would have noticed.

>> The compiler will detected the syntax error.
> That was not my question. It was what to do with this:
OK, different question.

>    type Acceß_Type is access Integer;
>    type Access_Type is access String;

I'd prefer them to be the same in this particular 
case, since the Swiss model (which is without ß)
is working.

> Do these identifiers conflict?
> 
>    I : Integer; -- Latin I
>    І : Integer; -- Ukrainian I

Different alphabets, different identifiers. Obfuscation is
the work of the programmer. A requirement to use only 
one "alphabet" for an identifier is a countermeasure.
It is easy to write very confusing Ada in ASCII, too.

> Is ö Latin? Are k, u, w Latin? BTW, Latin script was all upper case.

Yes, they are to be classified as Latin, because
programmers are used to it, and the relevant standards
apply, too.  W is double-v or double-u if you insist that
history should play a part, etc. but this only complicates
the matter for programming, without need, IMHO.
.

>> - Cyrillic characters
> 
> "Cyrillic characters" is a wild mixture of various characters and
> ligatures of (like German ß) from different national Cyrillic alphabets,
> with borrowing from Greek, Latin and later inventions. There is no reason
> to treat combinations of those as something cohesive.

I think it is reasonable to define useful, simple sets of the
characters that people will consider related: It is easier to
write a text message in Danish on a dated phone if this
phone has French keys rather than Ukrainian keys. 
French is better in this case because ø might not be
on the keyboard, but most other characters are. Whether
Danish or French, the Inclusive Latin Alphabet lets me
write in either language. The inclusive Cyrillic Alphabet will
let others do the same. That is, write using sets of characters
that people will consider related, in this practical sense,
when using Slavic languages.

The real trouble is with geeky identifiers that,
for reasons of being cool, must allow the mixing of
alphabets, because employing transcription, transliteration,
or translation is about as cool as not writing
geeky identifiers at all...

>> But this should be fairly easy
>> to implement,
> 
> It is not about implementation, it is about understanding the rules without
> looking into the categorization tables.

If a word looks like a mix of Cyrillic characters,
then, to a compiler, it is Cyrillic. A programmer
seeing Cyrillic characters will, on average, be
right in assuming that he is seeing some
identifier written in some Slavic language.
No more, no less. If the rule says that only
one alphabet is permitted per simple name,
it reduces complexity for the reader of Slavic
language.

> 
> BTW, why "ΔT" should be illegal?

Yes, illegal if the alphabet rules apply. The solution
is just like in real life: 
When it is an operator, really, make it one.
When it is some prefixed hint, other conventions
have worked well, even in textbooks
When "delta-t" is the name of something in the
problem domain, chances are one can
find a specific name that is better than "delta-t".

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like AC CESS
  2011-10-12 22:56               ` sharp ß and ss in Ada keywords like AC CESS Georg Bauhaus
@ 2011-10-13  8:10                 ` Dmitry A. Kazakov
  2011-10-13 12:13                   ` Georg Bauhaus
  0 siblings, 1 reply; 32+ messages in thread
From: Dmitry A. Kazakov @ 2011-10-13  8:10 UTC (permalink / raw)

On 12 Oct 2011 22:56:38 GMT, Georg Bauhaus wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote:
> 
>>> "acceβ" will be an error, because it mixes two "alphabets",
>>> Latin and Greek.
>> 
>> ß has nothing to do with Greek alphabet, it is a ligature promoted to a
>> separate character. 
> 
> "acceβ" has a Greek BETA, and the compiler would have noticed.

Why should it be? How is "acceβ" worse than "acceß"? 

>>    type Acceß_Type is access Integer;
>>    type Access_Type is access String;
> 
> I'd prefer them to be the same in this particular 
> case, since the Swiss model (which is without ß)
> is working.

What is the reason for them to be same?

>> Do these identifiers conflict?
>> 
>>    I : Integer; -- Latin I
>>    І : Integer; -- Ukrainian I
> 
> Different alphabets, different identifiers.

How do you know in which alphabet is "Mass"? Why should it conflict with
"Maß" for some French programmer?

Rules should *reasonable*. There cannot be any reasonable rule why ß=ss,
but I /= І.

>> Is ö Latin? Are k, u, w Latin? BTW, Latin script was all upper case.
> 
> Yes, they are to be classified as Latin, because
> programmers are used to it, and the relevant standards
> apply, too.  W is double-v or double-u if you insist that
> history should play a part, etc. but this only complicates
> the matter for programming, without need, IMHO.

If "Latin" does not mean Latin, then you need yet another nonsensical rule
to redefine it.

>>> - Cyrillic characters
>> 
>> "Cyrillic characters" is a wild mixture of various characters and
>> ligatures of (like German ß) from different national Cyrillic alphabets,
>> with borrowing from Greek, Latin and later inventions. There is no reason
>> to treat combinations of those as something cohesive.
> 
> I think it is reasonable to define useful, simple sets of the
> characters that people will consider related:

Who are these people? How would you do that and why should Ada language
care?

BTW 1, show me a natural language alphabet in which "_" is a letter?

BTW 2, "'" is a letter in some Russian texts, it is used as a letter (a
part of written word) in German, English and, I presume, in many other
languages I don't know. Nevertheless, it is not a letter according to
Unicode and not a letter in Ada.

> That is, write using sets of characters
> that people will consider related, in this practical sense,
> when using Slavic languages.

Except that half of those languages use no Cyrillic letters at all (e.g.
Polish).

>>> But this should be fairly easy
>>> to implement,
>> 
>> It is not about implementation, it is about understanding the rules without
>> looking into the categorization tables.
> 
> If a word looks like a mix of Cyrillic characters,

You cannot see characters, you do glyphs. Glyphs used in European languages
are massively shared because all alphabets used there stem from one root
and used to influence each other throughout all their history. You cannot
safely recognize alphabet looking at a single word.  

> A programmer
> seeing Cyrillic characters will, on average, be
> right in assuming that he is seeing some
> identifier written in some Slavic language.

Program legality based on statistic analysis? That must be a lot of fun!

>> BTW, why "ΔT" should be illegal?
> 
> Yes, illegal if the alphabet rules apply.

*Why* should they apply? You should give some basic principles for your
rules, language independent ones. E.g. readability, simplicity of use etc.
How does "I is not I" improve readability?

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like AC CESS
  2011-10-13  8:10                 ` Dmitry A. Kazakov
@ 2011-10-13 12:13                   ` Georg Bauhaus
  2011-10-13 13:25                     ` Dmitry A. Kazakov
  0 siblings, 1 reply; 32+ messages in thread
From: Georg Bauhaus @ 2011-10-13 12:13 UTC (permalink / raw)

On 13.10.11 10:10, Dmitry A. Kazakov wrote:

> How is "acceβ" worse than "acceß"? 

The first identifier mixes two different "alphabets";
The second identifier uses variant spelling.
The second identifier can be changed without debate.
This is how "acceβ" worse than "acceß".

The first identifier, mixing two "alphabets", is
like a literal mixing Roman numerals and Arabic numerals
to form a single numeric literal:

   Some_Year : constant := MCD92;

Yeah, that's fun. And there are no Chinese digits
in it, so Europeans and Americans will likely recognize the
intent. But otherwise? What's the point when mixing two
numeric alphabets?

>>>    type Acceß_Type is access Integer;
>>>    type Access_Type is access String;
>>
>> I'd prefer them to be the same in this particular 
>> case, since the Swiss model (which is without ß)
>> is working.
> 
> What is the reason for them to be same?

"the Swiss model (which is without ß) is working."
It is more than technical work, see below

> How do you know in which alphabet is "Mass"? Why should it conflict with
> "Maß" for some French programmer?

The identifiers shouldn't be in conflict, but Ada
makes them be in conflict, Randy has stated one reason.

I can think of two reasons for ss = ß but I /= І:

1) I /= І, since they are from "alphabets" that real people
   think are different. To make the example less artificial,
   Let Αδα /= Ada.  Put your projects at risk and hire
   programmers who would write Aδα (A["03B4"]["03B1"]).
   The compiler will help you finding them out.

2) ss = ß because real people think and act as though
   they are the same, and, importantly, more so (note
   the non-binary, comparative phrase) WRT ss = ß than
   WRT ä = ae, since absence of ä is considered a computer
   thingy, but equivalence of ss and ß is well established
   with or without computer. It is a different issue.

Engineers have projected the situation onto the technical
axis and wisely introduced a capital ß.  This will not
resolve (2) above. The programming situation is placed
somewhere along several axes, not just a technical one.
I therefore speculate that capital ß will add another
power of two to the complexity of the issue in real life.
(Or, finally, make us learn how to use Unicode properly.)

"Mass" has four Latin characters, "Maß" has three,
if you ask any child; they are both written using
Latin characters if you ask any programmer, not Greek
not Hebrew, not Cyrillic, or anything, just Latin.

> If "Latin" does not mean Latin, then you need yet another nonsensical rule
> to redefine it.

"Latin" is here meant to refer to the general thing.
"Latin characters used in Europe" is pretty clear,
and no one will sue you if you include ä or ł.

Declaring simple unions of sections from Unicode is easy,
and consistent.  You just ask some programmers who know
both characters sets and who have been programming for some
time.  The intent is not to bend existing logic, the
intent is to have rules that prevent too much fun with
identifiers (note the comparative).

Glyphs don't help at all, either when constructing an issue
or when resolving an issue. For example, in my terminal window,
the second І looks like |, it does not have serifs. I does.

> Who are these people? 

The people who influence standards.

> Except that half of those languages use no Cyrillic letters at all (e.g.
> Polish).

Poles will still consider sets of Cyrillic characters
to be related, I should think.  Possibly more so than people
West of Poland if Poles more frequently know another Slavic
language.

>> If a word looks like a mix of Cyrillic characters,
> 
> You cannot see characters, you do glyphs.

That's techno-speak again, but programmers see characters
if you ask them. Techno-think is not alone in establishing
writing habits.

> You cannot
> safely recognize alphabet looking at a single word. 

I am looking at programs, not at single words.

> 
>> A programmer
>> seeing Cyrillic characters will, on average, be
>> right in assuming that he is seeing some
>> identifier written in some Slavic language.
> 
> Program legality based on statistic analysis? That must be a lot of fun!

We employ tons of statistics when reading text.
Reading programs is also best when it is fun. We also employ
averaging when writing programs (which pattern has worked best,
what is a good name for this thing, what has worked in the past,
etc.)  Hoare has suggested the use of a macro named
PRELIMINARY_ASSUMPTION, IIRC, so "assuming" seems a normal
attitude when programming, in general.

__
[*] I am referring to the German writers making a scene vis-à-vis
the spelling reform and talking seriously about the destruction of
culture.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like AC CESS
  2011-10-13 12:13                   ` Georg Bauhaus
@ 2011-10-13 13:25                     ` Dmitry A. Kazakov
  2011-10-13 15:18                       ` Georg Bauhaus
  0 siblings, 1 reply; 32+ messages in thread
From: Dmitry A. Kazakov @ 2011-10-13 13:25 UTC (permalink / raw)


On Thu, 13 Oct 2011 14:13:36 +0200, Georg Bauhaus wrote:

> On 13.10.11 10:10, Dmitry A. Kazakov wrote:
> 
>> How is "acceβ" worse than "acceß"? 
> 
> The first identifier mixes two different "alphabets";
> The second identifier uses variant spelling.
> The second identifier can be changed without debate.
> This is how "acceβ" worse than "acceß".

Not convincing. It refers to "alphabets" and "spellings". Why should
anybody care about them?

> The first identifier, mixing two "alphabets", is
> like a literal mixing Roman numerals and Arabic numerals
> to form a single numeric literal:
> 
>    Some_Year : constant := MCD92;
>
> Yeah, that's fun. And there are no Chinese digits
> in it, so Europeans and Americans will likely recognize the
> intent.

Motor control device, the model of the year 1992?

> But otherwise? What's the point when mixing two
> numeric alphabets?

9 and 2 are not letters. What is the point to mix them at all? E.g. in
"I2"? I don't care if I2 is improperly spelt in any natural language.

>>>>    type Acceß_Type is access Integer;
>>>>    type Access_Type is access String;
>>>
>>> I'd prefer them to be the same in this particular 
>>> case, since the Swiss model (which is without ß)
>>> is working.
>> 
>> What is the reason for them to be same?
> 
> "the Swiss model (which is without ß) is working."

I bet 90% of Ada users could not care less.

>> How do you know in which alphabet is "Mass"? Why should it conflict with
>> "Maß" for some French programmer?
> 
> The identifiers shouldn't be in conflict, but Ada
> makes them be in conflict, Randy has stated one reason.
> 
> I can think of two reasons for ss = ß but I /= І:
> 
> 1) I /= І, since they are from "alphabets" that real people
>    think are different.

Show me one, who thinks they are different without hexadecimal editor. 

>   To make the example less artificial,
>    Let Αδα /= Ada.  Put your projects at risk and hire
>    programmers who would write Aδα (A["03B4"]["03B1"]).
>    The compiler will help you finding them out.

If some nonsensical language rules put projects at risk, then, maybe, there
is something wrong with these rules?

> 2) ss = ß because real people think and act as though
>    they are the same, and, importantly, more so (note
>    the non-binary, comparative phrase) WRT ss = ß than
>    WRT ä = ae, since absence of ä is considered a computer
>    thingy, but equivalence of ss and ß is well established
>    with or without computer. It is a different issue.

Real people also think that sch=sh, kn=n (at the beginning of the word),
oo=u, ee=i and ad infinitum.

> "Mass" has four Latin characters, "Maß" has three,

So why are they equivalent?

>> If "Latin" does not mean Latin, then you need yet another nonsensical rule
>> to redefine it.
> 
> "Latin" is here meant to refer to the general thing.
> "Latin characters used in Europe" is pretty clear,

Does this include Greece, Serbia, Bulgaria?

> and no one will sue you if you include ä or ł.

That must the reason? No one could if I exclude them and ß too.

> Declaring simple unions of sections from Unicode is easy,
> and consistent.

Again, what is the rationale? It is quite easy to jump out of the 10th
floor window. There result will be very consistent too. Why should anybody
do this?

>> Who are these people? 
> 
> The people who influence standards.

Sure, who would care about users of the standards... (:-))

>>> If a word looks like a mix of Cyrillic characters,
>> 
>> You cannot see characters, you do glyphs.
> 
> That's techno-speak again, but programmers see characters
> if you ask them.

Nope, it is a physiological fact that people see glyphs.

>> You cannot
>> safely recognize alphabet looking at a single word. 
> 
> I am looking at programs, not at single words.

Does this mean that a program may not use several alphabets? Great, the
package Ada.Numerics is illegal. I always knew it!

>>> A programmer
>>> seeing Cyrillic characters will, on average, be
>>> right in assuming that he is seeing some
>>> identifier written in some Slavic language.
>> 
>> Program legality based on statistic analysis? That must be a lot of fun!
> 
> We employ tons of statistics when reading text.

This has nothing to do with the validity of such texts. I don't care about
Swiss model, I do about separate compilation. I don't want the legality of
components (tested, verified, validated) be randomly dependent on other
parts by mere placing them into one project.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like AC CESS
  2011-10-13 13:25                     ` Dmitry A. Kazakov
@ 2011-10-13 15:18                       ` Georg Bauhaus
  2011-10-13 19:17                         ` Dmitry A. Kazakov
  0 siblings, 1 reply; 32+ messages in thread
From: Georg Bauhaus @ 2011-10-13 15:18 UTC (permalink / raw)

On 13.10.11 15:25, Dmitry A. Kazakov wrote:
> It refers to "alphabets" and "spellings". Why should
> anybody care about them?

Many German programmers *do* care about them.
Most issues arise in proportion to the ease with which
a language allows mixing of alphabets: If you allow all
kinds of identifiers, people will try them.

>>    Some_Year : constant := MCD92;
>>
>> Yeah, that's fun. And there are no Chinese digits
>> in it, so Europeans and Americans will likely recognize the
>> intent.
> 
> Motor control device, the model of the year 1992?

More clearly,

   Some_Year : constant Natural := MCD92;

>> But otherwise? What's the point when mixing two
>> numeric alphabets?
> 
> 9 and 2 are not letters.

M, C, and D, in this hypothetical examples, are digits,
not letters; the example is not that far-fetched, since
roman numerals are known well.  The point is not that
identifiers can have digits in them.  The point is that
is has seemed reasonable to let digits of numeric be drawn
from the "alphabet" '0' .. '9', and no others.

IOW, there is a single "alphabet" of digits, other digits
will not be acceptable when forming numeric literals.

I think this is reasonable.  The one-alphabet rule for
simple names is not much different.

>> "the Swiss model (which is without ß) is working."
> 
> I bet 90% of Ada users could not care less.

It will be interesting to learn what your bet is thus why
- 90% will not care
- 10% do care

I agree that many do not care about anything other than ASCII.
With the usual consequences that carelessness tends to foster
regarding the I/O of non-computer text.

>> 1) I /= І, since they are from "alphabets" that real people
>>    think are different.
> 
> Show me one, who thinks they are different without hexadecimal editor. 

My terminal, as I said, shows the difference between I and І very
clearly. Any compiler shows they are different identifiers.
As I said, obfuscation is programmer's business.

Starting to feel like a broken record, the same goes for ASCII
l, 1, i, I. Consequently, GNAT sources show a rule: I is not the
name of an indexing variable...  I'm just considering something
that is a generalization, simple, practical, worldly, neither purporting
to solve issues of discipline, nor pretending to erase all obfuscation.

> If some nonsensical language rules put projects at risk, then, maybe, there
> is something wrong with these rules?

I have shown how to apply the rules and how they don't do any harm.
They reduce complexity. They hardly put anything at risk, I think,
since they only reduce the number of identifiers (do not mix "alphabets").

> Real people also think that sch=sh, kn=n (at the beginning of the word),
> oo=u, ee=i and ad infinitum.

This is demonstrably false, since, for example, those who write(!)
using Latin will never in fact replace "kn" with "n".

>> "Mass" has four Latin characters, "Maß" has three,
> 
> So why are they equivalent?

Because people say, in large numbers, that they are equivalent.

>>> If "Latin" does not mean Latin, then you need yet another nonsensical rule
>>> to redefine it.
>>
>> "Latin" is here meant to refer to the general thing.
>> "Latin characters used in Europe" is pretty clear,
> 
> Does this include Greece, Serbia, Bulgaria?

This should be clear from the original list; also
"Latin characters used in Europe" is pretty clear
to anyone not trying to twist things.

>> Declaring simple unions of sections from Unicode is easy,
>> and consistent.
> 
> Again, what is the rationale?

To reduce complexity of identifiers, and to make reading easier.

>> but programmers see characters
>> if you ask them.
> 
> Nope, it is a physiological fact that people see glyphs.

A)  "It's the 3rd character from the right".

B)  "It's the 3rd glyph from the right".

How likely is it that any real programmer will say sentence
B, and not A?

>>> You cannot
>>> safely recognize alphabet looking at a single word. 
>>
>> I am looking at programs, not at single words.
> 
> Does this mean that a program may not use several alphabets?

Of course not, as stated, this applies to simple names.
A simple name alone is not a program, and has little meaning.

> I don't care about
> Swiss model, I do about separate compilation. I don't want the legality of
> components (tested, verified, validated) be randomly dependent on other
> parts by mere placing them into one project.

We could never reuse anything but Ada 83 units: The legality
of components would have to be re-established under the rules
of more recent Adas.
  Of course, everything about "alphabets" applies to software
that is written, or changed systematically. Somewhat like turning
a C program into a MISRA-C program, or an Ada program into one
to which a profile applies.  This changes the language, but you
do it for a reason.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: sharp ß and ss in Ada keywords like AC CESS
  2011-10-13 15:18                       ` Georg Bauhaus
@ 2011-10-13 19:17                         ` Dmitry A. Kazakov
  0 siblings, 0 replies; 32+ messages in thread
From: Dmitry A. Kazakov @ 2011-10-13 19:17 UTC (permalink / raw)

On Thu, 13 Oct 2011 17:18:31 +0200, Georg Bauhaus wrote:

> On 13.10.11 15:25, Dmitry A. Kazakov wrote:
>> It refers to "alphabets" and "spellings". Why should
>> anybody care about them?
> 
> Many German programmers *do* care about them.

See below.

>> 9 and 2 are not letters.
> 
> M, C, and D, in this hypothetical examples, are digits,

They are not. See:

http://en.wikipedia.org/wiki/Number_Forms

> IOW, there is a single "alphabet" of digits, other digits
> will not be acceptable when forming numeric literals.
> 
> I think this is reasonable.

Why do you think so? It does not make any sense either from the standpoint
of a particular language (Latin and modern numerals are not the only
existing systems). Neither it has any sense according to Unicode, which has
Roman numerals, fractions, superscript and subscript numerals. If Ada
sincerely wanted to be Unicode the following should be legal:

   Half : constant Float := ½;

BTW, Hebrew which has its own numerals, widely used in mathematics for
cardinal numbers (e.g.  aleph-0 which happily mixes two different numeral
systems!), Hebrew is right to left. So many Israeli programmers might wish
Ada identifiers *starting* with digits rather than ending by them. Why
ß-rules should have preference?

>>> "the Swiss model (which is without ß) is working."
>> 
>> I bet 90% of Ada users could not care less.
> 
> It will be interesting to learn what your bet is thus why
> - 90% will not care

Because at least 90% never heard about ß and never will.

>>> 1) I /= І, since they are from "alphabets" that real people
>>>    think are different.
>> 
>> Show me one, who thinks they are different without hexadecimal editor. 
> 
> My terminal, as I said, shows the difference between I and І very
> clearly.

You have no font installed. Is it rendered as a box? My computer shows no
difference in Times New Roman.

> Any compiler shows they are different identifiers.

Rather that showing them same.

> Starting to feel like a broken record, the same goes for ASCII
> l, 1, i, I.

These are clearly different in any fixed size font. Anyway even if true,
this cannot serve in favor of multiplying homographs. Two wrongs do not
make one right.

>>> "Mass" has four Latin characters, "Maß" has three,
>> 
>> So why are they equivalent?
> 
> Because people say, in large numbers, that they are equivalent.

This requires a proof.

People in large numbers believe in various things, most of which are false.
But find anyone who would agree with you that:

   Mass_Spectrograph = Maß_Spectrograph

>>>> If "Latin" does not mean Latin, then you need yet another nonsensical rule
>>>> to redefine it.
>>>
>>> "Latin" is here meant to refer to the general thing.
>>> "Latin characters used in Europe" is pretty clear,
>> 
>> Does this include Greece, Serbia, Bulgaria?
> 
> This should be clear from the original list; also
> "Latin characters used in Europe" is pretty clear
> to anyone not trying to twist things.

How is it clear? Greece has not been expelled from the eurozone and Cyprus
will stay... (:-))

>>> Declaring simple unions of sections from Unicode is easy,
>>> and consistent.
>> 
>> Again, what is the rationale?
> 
> To reduce complexity of identifiers, and to make reading easier.

Like I /= І rule?

>>> but programmers see characters
>>> if you ask them.
>> 
>> Nope, it is a physiological fact that people see glyphs.
> 
> A)  "It's the 3rd character from the right".
> 
> B)  "It's the 3rd glyph from the right".
> 
> How likely is it that any real programmer will say sentence
> B, and not A?

That does not change the fact. The programmer would also say that sun is
"rising" and program is full of "bugs".

People see glyphs because they are visual representations of characters or
their parts. You need very good arguments why the programming language is
constructed so, that two different programs have same visual representation
in standard text editors.

>>>> You cannot
>>>> safely recognize alphabet looking at a single word. 
>>>
>>> I am looking at programs, not at single words.
>> 
>> Does this mean that a program may not use several alphabets?
> 
> Of course not, as stated, this applies to simple names.
> A simple name alone is not a program, and has little meaning.

Thus we return back to the point that the alphabet cannot be identified.

>> I don't care about
>> Swiss model, I do about separate compilation. I don't want the legality of
>> components (tested, verified, validated) be randomly dependent on other
>> parts by mere placing them into one project.
> 
> We could never reuse anything but Ada 83 units: The legality
> of components would have to be re-established under the rules
> of more recent Adas.

Are you serious? Some really vital things are not even considered due to
sacred compatibility and you are proposing to forget about that in favor of
silly ß=ss?

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2011-10-13 19:17 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-10 16:30 sharp ß and ss in Ada keywords like ACCESS Georg Bauhaus
2011-10-10 16:46 ` Adam Beneschan
2011-10-10 18:23   ` Georg Bauhaus
2011-10-10 22:25     ` sharp � " Randy Brukardt
2011-10-11  7:36       ` Dmitry A. Kazakov
2011-10-11  7:41         ` sharp ß " Yannick Duchêne (Hibou57)
2011-10-11  8:33           ` Dmitry A. Kazakov
2011-10-11 20:32             ` sharp � " Randy Brukardt
2011-10-12  7:43               ` Dmitry A. Kazakov
2011-10-12  9:42                 ` J-P. Rosen
2011-10-12 12:09                   ` Dmitry A. Kazakov
2011-10-12 20:17                 ` sharp " Randy Brukardt
2011-10-12 21:18                   ` Dmitry A. Kazakov
2011-10-11 17:33     ` sharp ß " Martin Krischik
2011-10-11 18:54       ` Adam Beneschan
2011-10-12 13:03       ` Georg Bauhaus
2011-10-12 13:48         ` Dmitry A. Kazakov
2011-10-12 18:24           ` Georg Bauhaus
2011-10-12 20:06             ` sharp � " Randy Brukardt
2011-10-12 20:48             ` sharp ß " Dmitry A. Kazakov
2011-10-12 22:56               ` sharp ß and ss in Ada keywords like AC CESS Georg Bauhaus
2011-10-13  8:10                 ` Dmitry A. Kazakov
2011-10-13 12:13                   ` Georg Bauhaus
2011-10-13 13:25                     ` Dmitry A. Kazakov
2011-10-13 15:18                       ` Georg Bauhaus
2011-10-13 19:17                         ` Dmitry A. Kazakov
2011-10-11  7:33   ` sharp ß and ss in Ada keywords like ACCESS Yannick Duchêne (Hibou57)
2011-10-11 14:32     ` Adam Beneschan
2011-10-11 17:26   ` sharp ß and ss in Ada keywords like ACCESS (better not) Martin Krischik
2011-10-12 12:34     ` Georg Bauhaus
2011-10-10 17:22 ` sharp ß and ss in Ada keywords like ACCESS Simon Wright
2011-10-10 17:45 ` AdaMagica

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox