From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,a82f86f344c98f79 X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,UTF8 Path: g2news2.google.com!news4.google.com!border1.nntp.dca.giganews.com!nntp.giganews.com!newsfeed00.sul.t-online.de!t-online.de!news.mind.de!news.musoftware.de!news.weisnix.org!newsfeed.ision.net!newsfeed2.easynews.net!ision!newsfeed.arcor.de!newsspool1.arcor-online.net!news.arcor.de.POSTED!not-for-mail From: "Dmitry A. Kazakov" Subject: Re: Avatox 1.0: Trouble with encoding in Windows Newsgroups: comp.lang.ada User-Agent: 40tude_Dialog/2.0.15.1 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit Reply-To: mailbox@dmitry-kazakov.de Organization: cbb software GmbH References: <45051d37@news.upm.es> <45053aec$0$5142$9b4e6d93@newsspool1.arcor-online.net> <5ZednRK-0M3K15rYnZ2dnUVZ_o2dnZ2d@megapath.net> <1158145462.921837.152720@i42g2000cwa.googlegroups.com> <1158224191.059815.103080@i42g2000cwa.googlegroups.com> <450AECB4.3060000@obry.net> <1158359363.29388.36.camel@localhost.localdomain> Date: Sat, 16 Sep 2006 09:46:21 +0200 Message-ID: NNTP-Posting-Date: 16 Sep 2006 09:46:05 CEST NNTP-Posting-Host: 2a3613a5.newsspool1.arcor-online.net X-Trace: DXC=KZo1`GEn[8MFJ3]dH>I?oEic==]BZ:afN4Fo<]lROoRAgUcjd<3m<;BHb_aCWV=`VE[6LHn;2LCVN[ On Sat, 16 Sep 2006 00:29:24 +0200, Georg Bauhaus wrote: > On Fri, 2006-09-15 at 20:53 +0200, Dmitry A. Kazakov wrote: > >> IMO, the idea to use Unicode for program sources is wrong. The language (be >> it formal or natural) should have a finite and reasonably small alphabet. >> Unicode is practically an open-end set of symbols most of them you wouldn't >> be able to either recognize or remember again. > > Unicode is quite flexible and allows a project to choose a reasonable > subset of characters. A portable subset is fairly easy to describe > because both Ada and UCS define a common character set from which you > can choose. No lengthy discussions of how to interpret 8 bits, > no issues with conforming compilers. Are you disagree with the point? How can a language be based on multiple alphabets? [you are talking about subsets] Would it be still one language? In the history there are examples of written natural languages changing alphabets. > Greek.Ω /= Electric.Ω is an issue in Ada 95, too, when you > use local character sets for two different files. > > Shou1d the number l, sorry, 1, not occur in source text, because it > is too easy to miss the difference, so please, remove it from the > Ada grammar? ;-) That is an issue of choosing a proper typeface. But Omega (glyph) is same. Code positions (semantic meaning of the symbol, Ohm vs. Greek Omega) are different. Exactly this is wrong. Because the semantics of a symbol is to be defined solely by the language, by Ada in our case. Unicode is not a language, so far, however, nothing would prevent us to define a Unicode position for any possible Ada program... (:-)) > You can extend the Unicode subset chosen for the project later, without > introducing ambiguity or a configuration issue. Using Unicode for > program source text lets you write identifiers that just cannot coexists > in Latin_1, or any 8bit character set. There are many ways to make code unmaintainable, like writing identifiers in linear B syllabary... -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de