From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,a82f86f344c98f79 X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,UTF8 Path: g2news2.google.com!news1.google.com!news4.google.com!news.glorb.com!newsfeed-east.nntpserver.com!nntpserver.com!statler.nntpserver.com!newsfeed.arcor.de!newsspool1.arcor-online.net!news.arcor.de.POSTED!not-for-mail Date: Mon, 11 Sep 2006 12:35:58 +0200 From: Georg Bauhaus User-Agent: Thunderbird 1.5.0.2 (X11/20060522) MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: Avatox 1.0: Trouble with encoding in Windows References: <45051d37@news.upm.es> In-Reply-To: <45051d37@news.upm.es> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Message-ID: <45053aec$0$5142$9b4e6d93@newsspool1.arcor-online.net> Organization: Arcor NNTP-Posting-Date: 11 Sep 2006 12:31:08 CEST NNTP-Posting-Host: 2b8a0017.newsspool1.arcor-online.net X-Trace: DXC=4h7mNdUlPE1nBOkdL^Lo7>ic==]BZ:af>4Fo<]lROoR1gUcjd<3m<;2TXALR>8i:=P9Ihe`B8B8YmdIi8Qd9Ab72f6QQ46= X-Complaints-To: usenet-abuse@arcor.de Xref: g2news2.google.com comp.lang.ada:6550 Date: 2006-09-11T12:31:08+02:00 List-Id: Manuel Collado wrote: > 1. The ASIS API should provide a way to know the character encoding of > the source file (I think it doesn't). Yes! This will help a lot in avoiding character set issues. And it might help prevent dodgy arguments like the ones presented by implementers against the clever requirement to write the identifier π in the Ada 2005 library. :-) > 2. The non-ASCII characters could be converted to XML character > references (&#nnn;) by Avatox. This is beyond my comprehension, in particular when XML does have standardized character set support. Numeric character entities will force me to look at geographical names like España (Spain) or Łódź (in Poland) written España and Łódź respectively. Will any Ada programmer find it pleasing to even write them as character strings in this equivalent way? Country: Wide_String := "Espa" & Wide_Character'Val(241) & "a"; Town: Wide_String := ( Wide_Character'Val(16#141#), Wide_Character'Val(16#F3#), 'd', Wide_Character'Val(16#17A#)); You could then go on and recommend writing Ada in the following style, just because some text editing tool that you remember might not properly handle white space: Town: String := "New" & Character'Val(32) & "York"; If programmers don't start accepting that there are more characters than can be expressed in 7bin ASCII and start making less buggy text tools (including Ada tools) then anyone will continue to have difficulties in international communications. I hope that ASIS is a chance to get this done. -- Georg