From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: a07f3367d7,8ea33c39efc56ac3 X-Google-Attributes: gida07f3367d7,public,usenet X-Google-NewGroupId: yes X-Google-Language: ENGLISH,UTF8 Path: g2news1.google.com!news4.google.com!feeder.news-service.com!feeds.phibee-telecom.net!news.osn.de!diablo2.news.osn.de!proxad.net!feeder2-2.proxad.net!newsfeed.arcor.de!newsspool4.arcor-online.net!news.arcor.de.POSTED!not-for-mail Date: Thu, 13 Oct 2011 14:13:36 +0200 From: Georg Bauhaus User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1 MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: sharp =?UTF-8?B?w58gYW5kIHNzIGluIEFkYSBrZXl3b3JkcyBsaWtlIEFD?= =?UTF-8?B?IENFU1M=?= References: <665628584340145751.161513rm-host.bauhaus-maps.arcor.de@news.arcor.de> <1tgwf2ey7q1qz.hpcw6dmx2aj2$.dlg@40tude.net> In-Reply-To: <1tgwf2ey7q1qz.hpcw6dmx2aj2$.dlg@40tude.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Message-ID: <4e96d5f0$0$6541$9b4e6d93@newsspool4.arcor-online.net> Organization: Arcor NNTP-Posting-Date: 13 Oct 2011 14:13:36 CEST NNTP-Posting-Host: 84b537b3.newsspool4.arcor-online.net X-Trace: DXC=9\0efjFU:Lh>_cHTX3j=[NBnY[@eSZ5 X-Complaints-To: usenet-abuse@arcor.de Xref: g2news1.google.com comp.lang.ada:21410 Date: 2011-10-13T14:13:36+02:00 List-Id: On 13.10.11 10:10, Dmitry A. Kazakov wrote: > How is "acceβ" worse than "acceß"? The first identifier mixes two different "alphabets"; The second identifier uses variant spelling. The second identifier can be changed without debate. This is how "acceβ" worse than "acceß". The first identifier, mixing two "alphabets", is like a literal mixing Roman numerals and Arabic numerals to form a single numeric literal: Some_Year : constant := MCD92; Yeah, that's fun. And there are no Chinese digits in it, so Europeans and Americans will likely recognize the intent. But otherwise? What's the point when mixing two numeric alphabets? >>> type Acceß_Type is access Integer; >>> type Access_Type is access String; >> >> I'd prefer them to be the same in this particular >> case, since the Swiss model (which is without ß) >> is working. > > What is the reason for them to be same? "the Swiss model (which is without ß) is working." It is more than technical work, see below > How do you know in which alphabet is "Mass"? Why should it conflict with > "Maß" for some French programmer? The identifiers shouldn't be in conflict, but Ada makes them be in conflict, Randy has stated one reason. I can think of two reasons for ss = ß but I /= І: 1) I /= І, since they are from "alphabets" that real people think are different. To make the example less artificial, Let Αδα /= Ada. Put your projects at risk and hire programmers who would write Aδα (A["03B4"]["03B1"]). The compiler will help you finding them out. 2) ss = ß because real people think and act as though they are the same, and, importantly, more so (note the non-binary, comparative phrase) WRT ss = ß than WRT ä = ae, since absence of ä is considered a computer thingy, but equivalence of ss and ß is well established with or without computer. It is a different issue. Engineers have projected the situation onto the technical axis and wisely introduced a capital ß. This will not resolve (2) above. The programming situation is placed somewhere along several axes, not just a technical one. I therefore speculate that capital ß will add another power of two to the complexity of the issue in real life. (Or, finally, make us learn how to use Unicode properly.) "Mass" has four Latin characters, "Maß" has three, if you ask any child; they are both written using Latin characters if you ask any programmer, not Greek not Hebrew, not Cyrillic, or anything, just Latin. > If "Latin" does not mean Latin, then you need yet another nonsensical rule > to redefine it. "Latin" is here meant to refer to the general thing. "Latin characters used in Europe" is pretty clear, and no one will sue you if you include ä or ł. Declaring simple unions of sections from Unicode is easy, and consistent. You just ask some programmers who know both characters sets and who have been programming for some time. The intent is not to bend existing logic, the intent is to have rules that prevent too much fun with identifiers (note the comparative). Glyphs don't help at all, either when constructing an issue or when resolving an issue. For example, in my terminal window, the second І looks like |, it does not have serifs. I does. > Who are these people? The people who influence standards. > Except that half of those languages use no Cyrillic letters at all (e.g. > Polish). Poles will still consider sets of Cyrillic characters to be related, I should think. Possibly more so than people West of Poland if Poles more frequently know another Slavic language. >> If a word looks like a mix of Cyrillic characters, > > You cannot see characters, you do glyphs. That's techno-speak again, but programmers see characters if you ask them. Techno-think is not alone in establishing writing habits. > You cannot > safely recognize alphabet looking at a single word. I am looking at programs, not at single words. > >> A programmer >> seeing Cyrillic characters will, on average, be >> right in assuming that he is seeing some >> identifier written in some Slavic language. > > Program legality based on statistic analysis? That must be a lot of fun! We employ tons of statistics when reading text. Reading programs is also best when it is fun. We also employ averaging when writing programs (which pattern has worked best, what is a good name for this thing, what has worked in the past, etc.) Hoare has suggested the use of a macro named PRELIMINARY_ASSUMPTION, IIRC, so "assuming" seems a normal attitude when programming, in general. __ [*] I am referring to the German writers making a scene vis-à-vis the spelling reform and talking seriously about the destruction of culture.