From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,a82f86f344c98f79 X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,UTF8 Date: Mon, 11 Sep 2006 15:49:34 +0200 From: Manuel Collado User-Agent: Thunderbird 1.5 (Windows/20051201) MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: Avatox 1.1: Trouble with encoding in Windows References: <45051d37@news.upm.es> <45053aec$0$5142$9b4e6d93@newsspool1.arcor-online.net> In-Reply-To: <45053aec$0$5142$9b4e6d93@newsspool1.arcor-online.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit NNTP-Posting-Host: 138.100.242.208 Message-ID: <4505696b@news.upm.es> X-Trace: 11 Sep 2006 15:49:31 +0100, 138.100.242.208 Path: g2news2.google.com!news3.google.com!border1.nntp.dca.giganews.com!nntp.giganews.com!newsfeed00.sul.t-online.de!t-online.de!130.59.10.21.MISMATCH!kanaga.switch.ch!switch.ch!news.rediris.es!news.upm.es!138.100.242.208 Xref: g2news2.google.com comp.lang.ada:6551 Date: 2006-09-11T15:49:34+02:00 List-Id: Georg Bauhaus escribió: > Manuel Collado wrote: > >> 1. The ASIS API should provide a way to know the character encoding of >> the source file (I think it doesn't). > > Yes! This will help a lot in avoiding character set issues. > And it might help prevent dodgy arguments like the ones presented > by implementers against the clever requirement to write the > identifier π in the Ada 2005 library. :-) Spanish identifiers like 'tamaño' (size) or 'año' (year) are currently accepted by GNAT. > > >> 2. The non-ASCII characters could be converted to XML character >> references (&#nnn;) by Avatox. > > This is beyond my comprehension, in particular when XML does > have standardized character set support. Numeric character > entities will force me to look at geographical names > like España (Spain) or Łódź (in Poland) > written España and Łódź > respectively. XML markup is meant to be written and read mostly by tools, not by humans. So it doesn't matter if a text fragment is coded as 'España' or as 'España'. In fact, after parsing, an XML processing agent cannot know how it was coded. My suggestion is that the Avatox encoding issue can be solved by simply writing non-ASCII characters as XML character references just when the final XML output is generated. > > Will any Ada programmer find it pleasing to even write them as > character strings in this equivalent way? > > Country: Wide_String := "Espa" & Wide_Character'Val(241) & "a"; > Town: Wide_String := ( > Wide_Character'Val(16#141#), > Wide_Character'Val(16#F3#), > 'd', > Wide_Character'Val(16#17A#)); > > You could then go on and recommend writing Ada in the following style, > just because some text editing tool that you remember might not properly > handle white space: > > Town: String := "New" & Character'Val(32) & "York"; > This is outside of scope. I've not spoken about adequate character representation in Ada sources, just in XML documents. > > If programmers don't start accepting that there are more > characters than can be expressed in 7bin ASCII and start > making less buggy text tools (including Ada tools) > then anyone will continue to have difficulties in international > communications. > > I hope that ASIS is a chance to get this done. > Amen to that. > > -- Georg Regards. -- Manuel Collado