From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 10.31.52.207 with SMTP id b198mr7373818vka.5.1458885271426; Thu, 24 Mar 2016 22:54:31 -0700 (PDT) X-Received: by 10.182.246.104 with SMTP id xv8mr117198obc.1.1458885271370; Thu, 24 Mar 2016 22:54:31 -0700 (PDT) Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!mx02.eternal-september.org!feeder.eternal-september.org!news.glorb.com!y89no9270796qge.0!news-out.google.com!pn7ni16800igb.0!nntp.google.com!nt3no4333179igb.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Thu, 24 Mar 2016 22:54:31 -0700 (PDT) In-Reply-To: <35689862-61dc-4186-87d3-37b17abed5a2@googlegroups.com> Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=2601:191:8201:bb5a:5985:2c17:9409:aa9c; posting-account=fdRd8woAAADTIlxCu9FgvDrUK4wPzvy3 NNTP-Posting-Host: 2601:191:8201:bb5a:5985:2c17:9409:aa9c References: <35689862-61dc-4186-87d3-37b17abed5a2@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <3a65e71c-41ee-49eb-916d-c0be8be9abc6@googlegroups.com> Subject: Re: UTF-8 Output and "-gnatW8" From: rieachus@comcast.net Injection-Date: Fri, 25 Mar 2016 05:54:31 +0000 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Xref: news.eternal-september.org comp.lang.ada:29884 Date: 2016-03-24T22:54:31-07:00 List-Id: On Thursday, March 24, 2016, Michael Rohan wrote: > The implementation for GNAT impacts the handling of strings, e.g.,=20 > > S : constant Wide_String :=3D "=CF=80";=20 > With "-gnatW8" this is correctly interpreted as a string of length 1=20 > containing the character U+03C0. Without the "-gnatW8" option, GNAT=20 > interprets it as a string of Characters to convert to a Wide_String,=20 > i.e., the two character U+00CF and U+0080=20 No, it is complex to explain and understand, but once you "get it" you shou= ldn't have any further problems. There are (at least) three character encoding in any Ada program, usually m= ore. It is nice if they can all match, but that can be problematic. The f= irst is the encoding used in source files. This obviously can't be chosen = by a pragma, and GNAT uses -gnatW8 to force UTF-8. Since the Ada standard = doesn't say anything about the operating system instructions used to call t= he compiler, this is fine. Note that the printer character set used for pr= ogram listings can be different, same with debugger settings, and the chara= cter set used by your terminal. Second are the character set(s) when an Ada program reads from or writes to= files or other devices. The Ada standard defines that binding for Text_IO= , Wide_Text_IO, etc. Notice that this is about character sets, not charact= er encodings. You could have a Form string "UTF8" for files, with a Wide_T= ext_IO version that understood it. Same for "Unicode" and so on. The prog= ram would only see ISO-10646 characters, but the generated files would be m= uch smaller. ;-) Finally there is the representation of (ISO-10646) characters in source fil= es. Yes, if the source file uses UTF8 you are fine. Well not really. The= terminal that you use to create the file may not have the characters you n= eed to use, or in the case of control characters, they may have a different= meaning to the compiler. The standard includes rules for such encodings, = and there are (standard) packages which need to use them. Anyway, you get the standard defined behavior when using -gnatW8, so there = is no bug. The "=C3=8F=E2=82=AC" or whatever shows up is probably governed= by your terminal's character set. I don't see a listing with Wide_Characte= r(16#CF80#) but you would need an output program that supports it. (The ni= ce thing about electronic displays is they can support dozens of different = character encodings.