From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,a82f86f344c98f79
X-Google-Attributes: gid103376,public
X-Google-Language: ENGLISH,UTF8
Path: 
 g2news2.google.com!news2.google.com!news.germany.com!newsfeed.utanet.at!newsfeed01.chello.at!newsfeed.arcor.de!newsspool4.arcor-online.net!news.arcor.de.POSTED!not-for-mail
Date: Mon, 11 Sep 2006 18:43:39 +0200
From: Georg Bauhaus <bauhaus@futureapps.de>
User-Agent: Thunderbird 1.5.0.2 (X11/20060522)
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: Re: Avatox 1.1: Trouble with encoding in Windows
References: <45051d37@news.upm.es>
 <45053aec$0$5142$9b4e6d93@newsspool1.arcor-online.net> <4505696b@news.upm.es>
In-Reply-To: <4505696b@news.upm.es>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Message-ID: <45059117$0$5144$9b4e6d93@newsspool1.arcor-online.net>
Organization: Arcor
NNTP-Posting-Date: 11 Sep 2006 18:38:47 CEST
NNTP-Posting-Host: 8442b772.newsspool1.arcor-online.net
X-Trace: 
 DXC=nbQ=B4I8^Efg`45cDR8l?oic==]BZ:afn4Fo<]lROoRagUcjd<3m<;bd:7_lo[WXKfUUng9_FXZ=c>:=P9Ihe`BhNfEh`d`9DJgR9boY;9DVCm
X-Complaints-To: usenet-abuse@arcor.de
Xref: g2news2.google.com comp.lang.ada:6552
Date: 2006-09-11T18:38:47+02:00
List-Id: <comp.lang.ada>

Manuel Collado wrote:

>> And it might help prevent dodgy arguments like the ones presented
>> by implementers against the clever requirement to write the
>> identifier π in the Ada 2005 library. :-)
> 
> Spanish identifiers like 'tamaño' (size) or 'año' (year) are currently
> accepted by GNAT.

Which makes the argument against π in the library even more bogus
in my book ;-)

> XML markup is meant to be written and read mostly by tools, not by
> humans. So it doesn't matter if a text fragment is coded as 'España' or
> as 'Espa&#xF1;a'. In fact, after parsing, an XML processing agent cannot
> know how it was coded.

Oh, there is nothing stopping an XML processor from keeping track of
input properties, even when the character representation is not an
issue after parsing.
Just like an ASIS tool could (should?) know the character encoding
of the Ada sources it has read.

> it doesn't matter if a text fragment is coded as 'España' or as
> 'Espa&#xF1;a'.

>>    Country: Wide_String := "Espa" & Wide_Character'Val(241) & "a";
...
>>    Town: String := "New" & Character'Val(32) & "York";
>>
> 
> This is outside of scope. I've not spoken about adequate character
> representation in Ada sources, just in XML documents.

Right, this was meant as an analogy: When I have to look at the
text, not process it, I'll be glad if identifiers and literals
are easy to read.

I think there is still a tradeoff between a 7bit external
represenation of ASIS in XML and its usability[1].
For example, when you look at ASIS streams in order to find out
why one of them is broken, XML processors can't do much, because,
their input is broken as a consequence.
Or when I am developing an XSL  transformation for
"refactoring" some of the identifiers in a program,
then I will have to look hard at "tama&#xF1;o" in order
to see that it just is "tamaño". That's not productive in my view.

 [1]  7bit might seem simple bitwise, but it isn't necessarily
easier to process because character entities must be handled, too.

-- Georg