From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,a82f86f344c98f79
X-Google-Attributes: gid103376,public
X-Google-Language: ENGLISH,UTF8
Path: 
 g2news2.google.com!news4.google.com!border1.nntp.dca.giganews.com!nntp.giganews.com!newsfeed00.sul.t-online.de!t-online.de!news.mind.de!news.musoftware.de!news.weisnix.org!newsfeed.ision.net!newsfeed2.easynews.net!ision!newsfeed.arcor.de!newsspool1.arcor-online.net!news.arcor.de.POSTED!not-for-mail
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: Avatox 1.0: Trouble with encoding in Windows
Newsgroups: comp.lang.ada
User-Agent: 40tude_Dialog/2.0.15.1
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
Reply-To: mailbox@dmitry-kazakov.de
Organization: cbb software GmbH
References: <45051d37@news.upm.es>
 <45053aec$0$5142$9b4e6d93@newsspool1.arcor-online.net>
 <5ZednRK-0M3K15rYnZ2dnUVZ_o2dnZ2d@megapath.net>
 <1158145462.921837.152720@i42g2000cwa.googlegroups.com>
 <1158224191.059815.103080@i42g2000cwa.googlegroups.com>
 <lEjOg.187763$1i1.48666@attbi_s72> <450AECB4.3060000@obry.net>
 <qbnprbmh5ou3.1cbnvrwbaa0ax$.dlg@40tude.net>
 <1158359363.29388.36.camel@localhost.localdomain>
Date: Sat, 16 Sep 2006 09:46:21 +0200
Message-ID: <d2k1w4avbl71.a3iefvlvh858.dlg@40tude.net>
NNTP-Posting-Date: 16 Sep 2006 09:46:05 CEST
NNTP-Posting-Host: 2a3613a5.newsspool1.arcor-online.net
X-Trace: 
 DXC=KZo1`GEn[8MFJ3]dH>I?oEic==]BZ:afN4Fo<]lROoRAgUcjd<3m<;BHb_aCWV=`VE[6LHn;2LCVN[<mhadbfdUKFHAg?KCGB7Jm`S^79Ag8JD
X-Complaints-To: usenet-abuse@arcor.de
Xref: g2news2.google.com comp.lang.ada:6605
Date: 2006-09-16T09:46:05+02:00
List-Id: <comp.lang.ada>

On Sat, 16 Sep 2006 00:29:24 +0200, Georg Bauhaus wrote:

> On Fri, 2006-09-15 at 20:53 +0200, Dmitry A. Kazakov wrote:
> 
>> IMO, the idea to use Unicode for program sources is wrong. The language (be
>> it formal or natural) should have a finite and reasonably small alphabet.
>> Unicode is practically an open-end set of symbols most of them you wouldn't
>> be able to either recognize or remember again.
> 
> Unicode is quite flexible and allows a project to choose a reasonable
> subset of characters. A portable subset is fairly easy to describe
> because both Ada and UCS define a common character set from which you
> can choose. No lengthy discussions of how to interpret 8 bits,
> no issues with conforming compilers.

Are you disagree with the point? How can a language be based on multiple
alphabets? [you are talking about subsets] Would it be still one language?
In the history there are examples of written natural languages changing
alphabets.

> Greek.Ω /= Electric.Ω is an issue in Ada 95, too, when you
> use local character sets for two different files.
> 
> Shou1d the number l, sorry, 1, not occur in source text, because it
> is too easy to miss the difference, so please, remove it from the
> Ada grammar? ;-)

That is an issue of choosing a proper typeface. But Omega (glyph) is same.
Code positions (semantic meaning of the symbol, Ohm vs. Greek Omega) are
different. Exactly this is wrong. Because the semantics of a symbol is to
be defined solely by the language, by Ada in our case. Unicode is not a
language, so far, however, nothing would prevent us to define a Unicode
position for any possible Ada program... (:-))

> You can extend the Unicode subset chosen for the project later, without
> introducing ambiguity or a configuration issue. Using Unicode for
> program source text lets you write identifiers that just cannot coexists
> in Latin_1, or any 8bit character set.

There are many ways to make code unmaintainable, like writing identifiers
in linear B syllabary...

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de