comp.lang.ada
 help / color / mirror / Atom feed
From: Michael Rohan <michael@zanyblue.com>
Subject: UTF-8 Output and "-gnatW8"
Date: Thu, 24 Mar 2016 10:23:51 -0700 (PDT)
Date: 2016-03-24T10:23:51-07:00	[thread overview]
Message-ID: <35689862-61dc-4186-87d3-37b17abed5a2@googlegroups.com> (raw)

Hi Folks,

I'm seeing, what I suspect, is a GNAT run-time encoding of an already encoded UTF-8 string when "-gnatW8" option is used.  The help info on "-gnatW8" states

-gnatW?   Wide character encoding method (?=h/u/s/e/8/b)

I've been using this option is state that my source files are UTF-8 encoded but I don't particular want to change the behaviour of the Ada.Text_IO routines.  I don't see an option that covers just the source file encoding without impacting the Text_IO (narrow) functionality.

I'm going to adjust my build process to only used "-gnatW8" when compiling sources that contain non-ASCII, UTF-8 characters.

It's pretty easy to see this.  Here's an already UTF-8 encoded string example:

with Ada.Text_IO;
procedure PiDay is
begin
   Ada.Text_IO.Put_Line (
      "It's " & Character'Val (16#CF#) & Character'Val (16#80#) & " day.");
end PiDay;

Building and executing with and without "-gnatW8" gives

$ gnatmake piday
gcc -c piday.adb
gnatbind -x piday.ali
gnatlink piday.ali
$ ./piday 
It's π day.
$ touch piday.adb 
$ gnatmake -gnatW8 piday
gcc -c -gnatW8 piday.adb
gnatbind -x piday.ali
gnatlink piday.ali
$ ./piday 
It's π day.

The RM includes an "Implementation Requirement":

16/3
 An Ada implementation shall accept Ada source code in UTF-8 encoding, with or without a BOM (see A.4.11), where every character is represented by its code point. The character pair CARRIAGE RETURN/LINE FEED (code points 16#0D# 16#0A#) signifies a single end of line (see 2.2); every other occurrence of a format_effector other than the character whose code point position is 16#09# (CHARACTER TABULATION) also signifies a single end of line.

It feels like we should be able to explicitly define the encoding for a source via pragma:

    pragma Character_Set ("UTF-8");

Take care,
Michael.


             reply	other threads:[~2016-03-24 17:23 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-24 17:23 Michael Rohan [this message]
2016-03-24 22:09 ` UTF-8 Output and "-gnatW8" Randy Brukardt
2016-03-24 22:34   ` Michael Rohan
2016-03-25 19:15     ` Randy Brukardt
2016-03-25  5:54 ` rieachus
2016-03-25 19:18   ` Randy Brukardt
2016-03-28 22:48     ` Michael Rohan
2016-03-29  7:44       ` Dmitry A. Kazakov
2016-03-29  8:39       ` G.B.
2016-03-29 22:35       ` Randy Brukardt
2016-04-04 10:52         ` G.B.
2016-04-05  0:39           ` Randy Brukardt
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox