From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,a82f86f344c98f79 X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII Date: Wed, 13 Sep 2006 12:32:34 +0200 From: Manuel Collado User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: es-ar, es, en-us, en MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: Avatox 1.0: Trouble with encoding in Windows References: <45051d37@news.upm.es> <45053aec$0$5142$9b4e6d93@newsspool1.arcor-online.net> <5ZednRK-0M3K15rYnZ2dnUVZ_o2dnZ2d@megapath.net> In-Reply-To: <5ZednRK-0M3K15rYnZ2dnUVZ_o2dnZ2d@megapath.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit NNTP-Posting-Host: 138.100.10.20 Message-ID: <4507de42@news.upm.es> X-Trace: 13 Sep 2006 12:32:34 +0100, 138.100.10.20 Path: g2news2.google.com!news3.google.com!border1.nntp.dca.giganews.com!nntp.giganews.com!nx01.iad01.newshosting.com!newshosting.com!newsfeed.icl.net!newsfeed.fjserv.net!newsfeed.freenet.de!feeder.news-service.com!tudelft.nl!binfeed1.tudelft.nl!kanaga.switch.ch!switch.ch!news.rediris.es!news.upm.es!138.100.10.20 Xref: g2news2.google.com comp.lang.ada:6571 Date: 2006-09-13T12:32:34+02:00 List-Id: Randy Brukardt escribi�: > "Georg Bauhaus" wrote in message > news:45053aec$0$5142$9b4e6d93@newsspool1.arcor-online.net... > >>Manuel Collado wrote: >> >>>1. The ASIS API should provide a way to know the character encoding of >>>the source file (I think it doesn't). >> >>Yes! This will help a lot in avoiding character set issues. >>And it might help prevent dodgy arguments like the ones presented >>by implementers against the clever requirement to write the >>identifier ? in the Ada 2005 library. :-) > > ASIS 99 currently returns identifiers in Wide_Strings. That is enough to > handle all possible Ada 95 programs. I suspect that the problem is in the > XML conversion tool not handling Wide_Characters properly and not with ASIS. > (Or just as likely, the XML processing tools not handling UTF-8 properly.) > > I suspect that the new version of ASIS will provide an option to get > identifiers in Wide_Wide_Strings. Sorry, the use of [Wide_]Wide_Strings doesn't imply anything about encoding. The Avatox problem appears just with characters with codepoints < 256. Example: the character with codepoint 0xC1 is 0xC1 0x00C1 # LATIN CAPITAL LETTER A WITH ACUTE if encoded as ISO-8859-1 (western countries), but it its 0xC1 0x0391 # GREEK CAPITAL LETTER ALPHA if encoded as ISO-8859-7 (Greece). The use of wide_chars just extends the codepoint range. To solve the problem a translation is required from the original source file encoding to a specific standard encoding (Unicode?) for strings reported via the ASIS API. Or else, don't make a translation, and report also the original source code encoding. This way the ASIS application can interpret (or simply report) strings in a meaningful way. > > In any case, one of the big advantages of using ASIS over writing your own > parser is that the resulting program is independent of the character set > used. So it works with anything supported by your compiler vendor (and still > does if you change vendors). ASIS code that depends on the input source > representation (which is not defined by Ada anyway) is probably broken. And > there is no chance of any sort of agreement on source representations for > ASIS (or even the naming of them) if there isn't be any for Ada. I'm not sure to understand you. Some style checks depend on source code representation. Like non-uniform casing for identifiers (mixing alpha and Alpha in the same source). Am I missing anything? > > Randy. Regards. -- To reply by e-mail, please remove the extra dot in the given address: m.collado -> mcollado