From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,a84eaf8fb2470909 X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,UTF8 Path: g2news2.google.com!news3.google.com!news4.google.com!news.glorb.com!news.doubleSlash.org!open-news-network.org!news.teledata-fn.de!newsfeed.arcor.de!newsspool1.arcor-online.net!news.arcor.de.POSTED!not-for-mail Newsgroups: comp.lang.ada Subject: Re: Ada generics From: Georg Bauhaus In-Reply-To: <15jxp8z1iu5fk.1oeihvavjghgg$.dlg@40tude.net> References: <1166710494.869393.108730@a3g2000cwd.googlegroups.com> <17fe4xfogg7p5.1dcyc5nyc2gsl.dlg@40tude.net> <1166805696.291429.239590@48g2000cwx.googlegroups.com> <186qujlcx6rwl.1h6eq4mbdaa5s$.dlg@40tude.net> <1167150212.165097.289010@73g2000cwn.googlegroups.com> <1qmdvus6du3xu.1n21tzgev46ia$.dlg@40tude.net> <1167246396.057028.325080@48g2000cwx.googlegroups.com> <15jxp8z1iu5fk.1oeihvavjghgg$.dlg@40tude.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Organization: # Message-Id: <1167327306.22163.66.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.6.1 Date: Thu, 28 Dec 2006 18:35:06 +0100 NNTP-Posting-Date: 28 Dec 2006 18:35:04 CET NNTP-Posting-Host: a5f54e97.newsspool1.arcor-online.net X-Trace: DXC=Q_dSF\OISC1^cW`WBF>WQ4Fo<]lROoR1^YC2XCjHcb99kd9h4\W5Z1N[WaBh2^>Q= X-Complaints-To: usenet-abuse@arcor.de Xref: g2news2.google.com comp.lang.ada:8033 Date: 2006-12-28T18:35:04+01:00 List-Id: On Thu, 2006-12-28 at 11:59 +0100, Dmitry A. Kazakov wrote: > > so it cannot ignore > > Unicode issues if it wishes to allow Unicode characters in identifiers. >=20 > Which is a BAD idea, IMO. >=20 > We cannot know anything about properties of letters in Klingon. As a > practical example consider Russian where e can be used (and is) in place = of > =D1=91 see (http://en.wikipedia.org/wiki/%D0%81), but not reverse. Or, ma= ybe we > should make Ada compilers capable to detect program written by Germans to > consider =C3=BC and ue same? Yes, writing source code is a question of being practical, which is probably not easily formalized... An international character set for portable programs seems to leave only some choices open when they should be practical, does it not? Naturally, mathematical fancies like being complete, free of contradictions, etc. are out of the question when it comes to writing for both humans and computers. What's the point of having a high level language when you are only allowed identifiers that the most simplistic mechanical interpreter can "understand"? Why is it that programmers become somewhat irrational and impractical when it comes to character sets? They do try to devise all kinds of pattern recognition algorithms, tricky transformations, get the best out of fuzzy measurement procedures, and so on. But not so with character sets. No no, every school child knows that characters must be such and such ... (maybe the early exposition to characters is to be held accountable here, everyone is an expert :-) Anyway, do we have some data that we could discuss that would explain the practical importance of Unicode/casing issues? Or, do we have programmers who are well versed in using a keyboard connected to a computer and still can't write a program that can tell apple characters from orange characters? GNAT already supports the detection of identifiers that were=20 spelled similarly. In case of errors, it lists their "relatives". Surely a helpful feature, and a proof that practical handling of natural language identifiers is possible. As an example, as you have been referring to German, consider that sharp s, '=C3=9F', is usually written "SS" when capitalized. So "Stra=C3=9Fe" tends to become "STRASSE". Now if you have a composite word that has - a '=C3=9F', and - an 's' right after it, such as "Ma=C3=9Fstab" (=3D scale, rule, yardstick), then from a simple minded formalist's perspective I could argue: "Using Unicode is nonsense because there is no 1:1 mapping for the German word 'Ma=C3=9Fstab' which will become 'MASSSTAB'. "SSS" is ambiguous, it could be "s=C3=9F" or it could be "=C3=9Fs". That's too big a challenge for a compiler write. So leave me alone with your Unicode and case insensitivity." Is that what computer science has to answer when asked about characters handling? Challenge: Try to find a significant number of German words that have an 's' before a '=C3=9F'. What's the consequence of your findings? Even if there are ambiguities in other languages, ambiguities are not new to Ada (and C++, IIRC), and they have been addressed. (It seems that the introduction of Unicode to Scheme 6 has recently made Lisp case sensitive based on arguments such as the one above. To me this shows "practicality" on the part of the language designer, vulgo just compiler writer's laziness.) If the programmers' representatives (the ARG for example) agree that it is practical to exclude some casing rules or "representation rules", such as "ue" <-> '=C3=BC', I'm perfectly happy. Because the rule *is* practical, it helps work, and to hell with mathematical fancies and game theoretic character shuffling possibilities, when they do not really matter. > What about parsing the source right to left, or top to bottom? The writing direction problem is solved. Similarly, it seems possible and practical to connect big endian and little endian computers, and have them cooperate using algorithms. Both exist, as do apples oranges, bananas, and pineapples. We can make nice fruit salads.