From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 10.182.87.2 with SMTP id t2mr6705041obz.2.1384684685511; Sun, 17 Nov 2013 02:38:05 -0800 (PST) X-Received: by 10.49.108.34 with SMTP id hh2mr488670qeb.4.1384684685485; Sun, 17 Nov 2013 02:38:05 -0800 (PST) Path: border1.nntp.dca3.giganews.com!backlog4.nntp.dca3.giganews.com!border4.nntp.dca.giganews.com!border2.nntp.dca.giganews.com!nntp.giganews.com!n1no1231028qai.0!news-out.google.com!9ni33120qaf.0!nntp.google.com!i2no3847191qav.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Sun, 17 Nov 2013 02:38:05 -0800 (PST) In-Reply-To: <5287a4d3$0$9523$9b4e6d93@newsspool1.arcor-online.net> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=31.183.18.217; posting-account=fc1UmgoAAADREbhuD8e4smj7nsEdRFz9 NNTP-Posting-Host: 31.183.18.217 References: <73e0853b-454a-467f-9dc7-84ca5b9c29b2@googlegroups.com> <1ghx537y5gbfq.17oazom68d4n6.dlg@40tude.net> <9d00683c-949c-4e88-a161-ebd78b350d39@googlegroups.com> <5287a4d3$0$9523$9b4e6d93@newsspool1.arcor-online.net> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <5096eb08-bb8f-4701-b8c2-5cfaef6885c7@googlegroups.com> Subject: Re: strange behaviour of utf-8 files From: Stoik Injection-Date: Sun, 17 Nov 2013 10:38:05 +0000 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Original-Bytes: 3185 Xref: number.nntp.dca.giganews.com comp.lang.ada:183907 Date: 2013-11-17T02:38:05-08:00 List-Id: W dniu sobota, 16 listopada 2013 18:01:07 UTC+1 u=C5=BCytkownik Georg Bauha= us napisa=C5=82: > On 16.11.13 16:09, Stoik wrote: >=20 >=20 >=20 > > Thanks for the answer. Your advice is certainly sound, but not very sat= isfactory. The whole purpose of utf-8 is to make >=20 > > things portable across platforms. If the compiler cannot deal properly = with the >=20 > > source code written in the utf-8 encoding, then the whole effort that w= ent into >=20 > > all the wide_ and wide_wide_ packages and the new packages that deal wi= th various encodings is lost (all the Latin-x possibilities are useless any= way, at least on Windows platform). I am adjoining a trivial program which = works differently according to the encoding (UTF-8 or ISO-8859-1) of the so= urce code, printing 1 or 2 as the answer. >=20 > > >=20 > > with ada.text_io; use ada.text_io; >=20 > > procedure example is >=20 > > S : String :=3D "=C3=B3"; >=20 > > begin >=20 > > Put_Line (S'Length'Img); >=20 > > end; >=20 >=20 >=20 > GNAT has two switches that affect its way of looking at >=20 > coded characters in source text: >=20 >=20 >=20 > for identifiers in source text, specify -gnatiC >=20 > where C is one of the characters listed 3.2.10 >=20 > of the GNAT UG accompanying the compiler; >=20 >=20 >=20 > for the wide character encoding method, specify -gnatWE >=20 > where E is one of the characters listed in the >=20 > same document. >=20 >=20 >=20 > With switch -gnatW8, I get >=20 >=20 >=20 > $ ./example >=20 > 1 >=20 > $ >=20 >=20 >=20 > That is, the source text is understood to be encoded >=20 > in UTF-8, and '=C3=B3' becomes Character'Val (243), viz. LC_O_Acute. Thank you for solving the problem, by mistake I have thanked another auther= first.