From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!news.mixmin.net!news2.arglkargh.de!news.karotte.org!uucp.gnuu.de!newsfeed.arcor.de!newsspool4.arcor-online.net!news.arcor.de.POSTED!not-for-mail Date: Sat, 16 Nov 2013 18:01:07 +0100 From: Georg Bauhaus User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: strange behaviour of utf-8 files References: <73e0853b-454a-467f-9dc7-84ca5b9c29b2@googlegroups.com> <1ghx537y5gbfq.17oazom68d4n6.dlg@40tude.net> <9d00683c-949c-4e88-a161-ebd78b350d39@googlegroups.com> In-Reply-To: <9d00683c-949c-4e88-a161-ebd78b350d39@googlegroups.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Message-ID: <5287a4d3$0$9523$9b4e6d93@newsspool1.arcor-online.net> Organization: Arcor NNTP-Posting-Date: 16 Nov 2013 18:01:07 CET NNTP-Posting-Host: c4f01361.newsspool1.arcor-online.net X-Trace: DXC=Ih[2:iXnH]beoCI^f\Y]Eaic==]BZ:afn4Fo<]lROoRankgeX?EC@@`L]_QDT8B6ioPCY\c7>ejVhjk=I8hkh:mc00G>eV4@Mca X-Complaints-To: usenet-abuse@arcor.de Xref: news.eternal-september.org comp.lang.ada:17697 Date: 2013-11-16T18:01:07+01:00 List-Id: On 16.11.13 16:09, Stoik wrote: > Thanks for the answer. Your advice is certainly sound, but not very satisfactory. The whole purpose of utf-8 is to make > things portable across platforms. If the compiler cannot deal properly with the > source code written in the utf-8 encoding, then the whole effort that went into > all the wide_ and wide_wide_ packages and the new packages that deal with various encodings is lost (all the Latin-x possibilities are useless anyway, at least on Windows platform). I am adjoining a trivial program which works differently according to the encoding (UTF-8 or ISO-8859-1) of the source code, printing 1 or 2 as the answer. > > with ada.text_io; use ada.text_io; > procedure example is > S : String := "ó"; > begin > Put_Line (S'Length'Img); > end; GNAT has two switches that affect its way of looking at coded characters in source text: for identifiers in source text, specify -gnatiC where C is one of the characters listed 3.2.10 of the GNAT UG accompanying the compiler; for the wide character encoding method, specify -gnatWE where E is one of the characters listed in the same document. With switch -gnatW8, I get $ ./example 1 $ That is, the source text is understood to be encoded in UTF-8, and 'ó' becomes Character'Val (243), viz. LC_O_Acute.