comp.lang.ada
 help / color / mirror / Atom feed
From: Manuel Collado <m.collado@lml.ls.fi.upm.es>
Subject: Re: Avatox 1.0: Trouble with encoding in Windows
Date: Wed, 13 Sep 2006 12:32:34 +0200
Date: 2006-09-13T12:32:34+02:00	[thread overview]
Message-ID: <4507de42@news.upm.es> (raw)
In-Reply-To: <5ZednRK-0M3K15rYnZ2dnUVZ_o2dnZ2d@megapath.net>

Randy Brukardt escribi�:
> "Georg Bauhaus" <bauhaus@futureapps.de> wrote in message
> news:45053aec$0$5142$9b4e6d93@newsspool1.arcor-online.net...
> 
>>Manuel Collado wrote:
>>
>>>1. The ASIS API should provide a way to know the character encoding of
>>>the source file (I think it doesn't).
>>
>>Yes! This will help a lot in avoiding character set issues.
>>And it might help prevent dodgy arguments like the ones presented
>>by implementers against the clever requirement to write the
>>identifier ? in the Ada 2005 library. :-)
> 
> ASIS 99 currently returns identifiers in Wide_Strings. That is enough to
> handle all possible Ada 95 programs. I suspect that the problem is in the
> XML conversion tool not handling Wide_Characters properly and not with ASIS.
> (Or just as likely, the XML processing tools not handling UTF-8 properly.)
> 
> I suspect that the new version of ASIS will provide an option to get
> identifiers in Wide_Wide_Strings.

Sorry, the use of [Wide_]Wide_Strings doesn't imply anything about 
encoding. The Avatox problem appears just with characters with 
codepoints < 256. Example: the character with codepoint 0xC1 is

    0xC1	0x00C1	#	LATIN CAPITAL LETTER A WITH ACUTE

if encoded as ISO-8859-1 (western countries), but it its

    0xC1	0x0391	#	GREEK CAPITAL LETTER ALPHA

if encoded as ISO-8859-7 (Greece). The use of wide_chars just extends 
the codepoint range.

To solve the problem a translation is required from the original source 
file encoding to a specific standard encoding (Unicode?) for strings 
reported via the ASIS API. Or else, don't make a translation, and report 
also the original source code encoding. This way the ASIS application 
can interpret (or simply report) strings in a meaningful way.

> 
> In any case, one of the big advantages of using ASIS over writing your own
> parser is that the resulting program is independent of the character set
> used. So it works with anything supported by your compiler vendor (and still
> does if you change vendors). ASIS code that depends on the input source
> representation (which is not defined by Ada anyway) is probably broken. And
> there is no chance of any sort of agreement on source representations for
> ASIS (or even the naming of them) if there isn't be any for Ada.

I'm not sure to understand you. Some style checks depend on source code 
representation. Like non-uniform casing for identifiers (mixing alpha 
and Alpha in the same source).

Am I missing anything?

> 
>                            Randy.

Regards.
-- 
To reply by e-mail, please remove the extra dot
in the given address:  m.collado -> mcollado




  parent reply	other threads:[~2006-09-13 10:32 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-11  8:24 Avatox 1.0: Trouble with encoding in Windows Manuel Collado
2006-09-11 10:35 ` Georg Bauhaus
2006-09-11 13:49   ` Avatox 1.1: " Manuel Collado
2006-09-11 16:43     ` Georg Bauhaus
2006-09-11 17:50     ` Björn Persson
2006-09-12  0:06       ` Marc A. Criley
2006-09-12  8:35         ` Manuel Collado
2006-09-13  0:01   ` Avatox 1.0: " Randy Brukardt
2006-09-13  9:01     ` Georg Bauhaus
2006-09-13 19:28       ` Björn Persson
2006-09-14  6:34         ` Georg Bauhaus
2006-09-14 23:09           ` Björn Persson
2006-09-14 22:13         ` Björn Persson
2006-09-16  7:40         ` Martin Krischik
2006-09-16  9:43           ` Björn Persson
2006-09-16  9:59             ` Georg Bauhaus
2006-09-16 11:15               ` Björn Persson
2006-09-17  9:30             ` Martin Krischik
2006-09-13 10:32     ` Manuel Collado [this message]
2006-09-13 18:28       ` Björn Persson
2006-09-14  8:11         ` Manuel Collado
2006-09-13 23:05       ` Randy Brukardt
2006-09-13 11:04     ` vgodunko
2006-09-14  8:56       ` Martin Krischik
2006-09-14 21:16         ` Jeffrey R. Carter
2006-09-14 22:55           ` Björn Persson
2006-09-15 23:15             ` Jeffrey R. Carter
2006-09-16  7:38             ` Martin Krischik
2006-09-17 19:41               ` Jeffrey R. Carter
2006-09-15  5:47           ` Martin Krischik
2006-09-15 23:16             ` Jeffrey R. Carter
2006-09-16  7:31               ` Martin Krischik
2006-09-17 19:43                 ` Jeffrey R. Carter
2006-09-15  9:41           ` Georg Bauhaus
2006-09-15 23:28             ` Jeffrey R. Carter
2006-09-16  9:52               ` Georg Bauhaus
2006-09-16 10:31               ` Björn Persson
2006-09-17 19:57                 ` Jeffrey R. Carter
2006-09-18  0:06                   ` Björn Persson
2006-09-18 20:14                     ` Jeffrey R. Carter
2006-09-16  5:10             ` Simon Wright
2006-09-15 18:11           ` Pascal Obry
2006-09-15 18:53             ` Dmitry A. Kazakov
2006-09-15 22:29               ` Georg Bauhaus
2006-09-16  7:46                 ` Dmitry A. Kazakov
2006-09-15 23:35             ` Jeffrey R. Carter
2006-09-15  5:34         ` Simon Wright
2006-09-12  9:52 ` Stephen Leake
2006-09-19  1:16   ` Marc A. Criley
2006-09-19  9:20     ` Stephen Leake
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox