comp.lang.ada
 help / color / mirror / Atom feed
* UTF-8 Output and "-gnatW8"
@ 2016-03-24 17:23 Michael Rohan
  2016-03-24 22:09 ` Randy Brukardt
  2016-03-25  5:54 ` rieachus
  0 siblings, 2 replies; 12+ messages in thread
From: Michael Rohan @ 2016-03-24 17:23 UTC (permalink / raw)


Hi Folks,

I'm seeing, what I suspect, is a GNAT run-time encoding of an already encoded UTF-8 string when "-gnatW8" option is used.  The help info on "-gnatW8" states

-gnatW?   Wide character encoding method (?=h/u/s/e/8/b)

I've been using this option is state that my source files are UTF-8 encoded but I don't particular want to change the behaviour of the Ada.Text_IO routines.  I don't see an option that covers just the source file encoding without impacting the Text_IO (narrow) functionality.

I'm going to adjust my build process to only used "-gnatW8" when compiling sources that contain non-ASCII, UTF-8 characters.

It's pretty easy to see this.  Here's an already UTF-8 encoded string example:

with Ada.Text_IO;
procedure PiDay is
begin
   Ada.Text_IO.Put_Line (
      "It's " & Character'Val (16#CF#) & Character'Val (16#80#) & " day.");
end PiDay;

Building and executing with and without "-gnatW8" gives

$ gnatmake piday
gcc -c piday.adb
gnatbind -x piday.ali
gnatlink piday.ali
$ ./piday 
It's π day.
$ touch piday.adb 
$ gnatmake -gnatW8 piday
gcc -c -gnatW8 piday.adb
gnatbind -x piday.ali
gnatlink piday.ali
$ ./piday 
It's π day.

The RM includes an "Implementation Requirement":

16/3
 An Ada implementation shall accept Ada source code in UTF-8 encoding, with or without a BOM (see A.4.11), where every character is represented by its code point. The character pair CARRIAGE RETURN/LINE FEED (code points 16#0D# 16#0A#) signifies a single end of line (see 2.2); every other occurrence of a format_effector other than the character whose code point position is 16#09# (CHARACTER TABULATION) also signifies a single end of line.

It feels like we should be able to explicitly define the encoding for a source via pragma:

    pragma Character_Set ("UTF-8");

Take care,
Michael.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-04-05  0:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-24 17:23 UTF-8 Output and "-gnatW8" Michael Rohan
2016-03-24 22:09 ` Randy Brukardt
2016-03-24 22:34   ` Michael Rohan
2016-03-25 19:15     ` Randy Brukardt
2016-03-25  5:54 ` rieachus
2016-03-25 19:18   ` Randy Brukardt
2016-03-28 22:48     ` Michael Rohan
2016-03-29  7:44       ` Dmitry A. Kazakov
2016-03-29  8:39       ` G.B.
2016-03-29 22:35       ` Randy Brukardt
2016-04-04 10:52         ` G.B.
2016-04-05  0:39           ` Randy Brukardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox