From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 10.140.223.17 with SMTP id t17mr8076824qhb.18.1459205290909; Mon, 28 Mar 2016 15:48:10 -0700 (PDT) X-Received: by 10.157.42.231 with SMTP id e94mr41918otb.5.1459205290859; Mon, 28 Mar 2016 15:48:10 -0700 (PDT) Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!mx02.eternal-september.org!feeder.eternal-september.org!news.glorb.com!b101no735676qga.1!news-out.google.com!u9ni70igk.0!nntp.google.com!av4no2486196igc.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Mon, 28 Mar 2016 15:48:10 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=2601:647:4d00:7e00:c8d4:efc7:a0f6:b6be; posting-account=1YPeQwoAAACAk-xhKPD32B0GIDdsFFtk NNTP-Posting-Host: 2601:647:4d00:7e00:c8d4:efc7:a0f6:b6be References: <35689862-61dc-4186-87d3-37b17abed5a2@googlegroups.com> <3a65e71c-41ee-49eb-916d-c0be8be9abc6@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <6406289c-06a8-46d1-a633-8a1c8a22f79b@googlegroups.com> Subject: Re: UTF-8 Output and "-gnatW8" From: Michael Rohan Injection-Date: Mon, 28 Mar 2016 22:48:10 +0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Xref: news.eternal-september.org comp.lang.ada:29914 Date: 2016-03-28T15:48:10-07:00 List-Id: On Friday, March 25, 2016 at 12:19:01 PM UTC-7, Randy Brukardt wrote: > wrote in message=20 > news:3a65e71c-41ee-49eb-916d-c0be8be9abc6@googlegroups.com... > ... > >Second are the character set(s) when an Ada program reads from or writes= to=20 > >files > >or other devices. The Ada standard defines that binding for Text_IO,=20 > >Wide_Text_IO, > >etc. Notice that this is about character sets, not character encodings.= =20 > >You could have > >a Form string "UTF8" for files, with a Wide_Text_IO version that underst= ood=20 > >it. Same > >for "Unicode" and so on. The program would only see ISO-10646 character= s,=20 > >but the > >generated files would be much smaller. ;-) >=20 > Or you could use Ada.Strings.Encodings to convert the string to UTF-8 and= =20 > then use Ada.Text_IO to output it. (This is an end-run round strong typin= g,=20 > sadly, but it works.) >=20 > Randy. >=20 > P.S. Robert, nice to hear from you again. It's been a while, hope you're= =20 > doing well. Hi, My approach is to encode myself and write the encoded Character via Text_IO= , reserving -gnatW8 for just those files containing UTF-8 data. It does, however, feel like there is something missing where it's "difficul= t" to have a Wide_String literal without having to have extra meta data for= compiler (-gnatW8) or having a relatively cumbersome concatenation of Wide= _Character's based on code points. BTW, the performance of GNAT for such a= concatenated string is pretty dismal. Not really advocating the C/C++ style \ escaping, e.g., \x, \u, \U, but it = would be "nice" to express such constant strings easily. It was mentioned = that Wide_Character'Val requires elaboration. Presumably, a compiler shoul= d be able to optimize it away but I'm not sure if it's allowed to do that? Take care, Michael.