From: Michael Rohan <michael@zanyblue.com>
Subject: UTF-8 Output and "-gnatW8"
Date: Thu, 24 Mar 2016 10:23:51 -0700 (PDT)
Date: 2016-03-24T10:23:51-07:00 [thread overview]
Message-ID: <35689862-61dc-4186-87d3-37b17abed5a2@googlegroups.com> (raw)
Hi Folks,
I'm seeing, what I suspect, is a GNAT run-time encoding of an already encoded UTF-8 string when "-gnatW8" option is used. The help info on "-gnatW8" states
-gnatW? Wide character encoding method (?=h/u/s/e/8/b)
I've been using this option is state that my source files are UTF-8 encoded but I don't particular want to change the behaviour of the Ada.Text_IO routines. I don't see an option that covers just the source file encoding without impacting the Text_IO (narrow) functionality.
I'm going to adjust my build process to only used "-gnatW8" when compiling sources that contain non-ASCII, UTF-8 characters.
It's pretty easy to see this. Here's an already UTF-8 encoded string example:
with Ada.Text_IO;
procedure PiDay is
begin
Ada.Text_IO.Put_Line (
"It's " & Character'Val (16#CF#) & Character'Val (16#80#) & " day.");
end PiDay;
Building and executing with and without "-gnatW8" gives
$ gnatmake piday
gcc -c piday.adb
gnatbind -x piday.ali
gnatlink piday.ali
$ ./piday
It's π day.
$ touch piday.adb
$ gnatmake -gnatW8 piday
gcc -c -gnatW8 piday.adb
gnatbind -x piday.ali
gnatlink piday.ali
$ ./piday
It's Ï day.
The RM includes an "Implementation Requirement":
16/3
An Ada implementation shall accept Ada source code in UTF-8 encoding, with or without a BOM (see A.4.11), where every character is represented by its code point. The character pair CARRIAGE RETURN/LINE FEED (code points 16#0D# 16#0A#) signifies a single end of line (see 2.2); every other occurrence of a format_effector other than the character whose code point position is 16#09# (CHARACTER TABULATION) also signifies a single end of line.
It feels like we should be able to explicitly define the encoding for a source via pragma:
pragma Character_Set ("UTF-8");
Take care,
Michael.
next reply other threads:[~2016-03-24 17:23 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-24 17:23 Michael Rohan [this message]
2016-03-24 22:09 ` UTF-8 Output and "-gnatW8" Randy Brukardt
2016-03-24 22:34 ` Michael Rohan
2016-03-25 19:15 ` Randy Brukardt
2016-03-25 5:54 ` rieachus
2016-03-25 19:18 ` Randy Brukardt
2016-03-28 22:48 ` Michael Rohan
2016-03-29 7:44 ` Dmitry A. Kazakov
2016-03-29 8:39 ` G.B.
2016-03-29 22:35 ` Randy Brukardt
2016-04-04 10:52 ` G.B.
2016-04-05 0:39 ` Randy Brukardt
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox