From: "Randy Brukardt" <randy@rrsoftware.com>
Subject: Re: UTF-8 Output and "-gnatW8"
Date: Fri, 25 Mar 2016 14:15:00 -0500
Date: 2016-03-25T14:15:00-05:00 [thread overview]
Message-ID: <nd42nl$fj5$1@loke.gir.dk> (raw)
In-Reply-To: 4f157cc0-1d3c-46d7-ab19-88a13ae0afd0@googlegroups.com
"Michael Rohan" <michael@zanyblue.com> wrote in message
news:4f157cc0-1d3c-46d7-ab19-88a13ae0afd0@googlegroups.com...
>Hi,
>
>OK, so this might be a compiler bug. The RM states the character set
>should
>be ISO 10646 so EBCDIC would seem to be something that is not allowed.
Ah, that's a common mistake. The RM specifies what the *runtime* character
set it. Prior to Ada 2012, it said *nothing* about the encoding of Ada
source code, and even now, it only talks about UTF-8 as one possibility for
that encoding. Anything else is allowed, including EBCDIC, Shift-JIS, or
even some sort of tree (the latter is explicitly mentioned as a possibility
in the AARM). Someone even suggested a source representation where '{' =
"begin", "} = "end", etc. (It was that suggestion that finally got the UTF-8
"standard" encoding into the Standard, to provide real interoperability for
Ada source code.)
>The implementation for GNAT impacts the handling of strings, e.g.,
>
>S : constant Wide_String := "?";
>
>With "-gnatW8" this is correctly interpreted as a string of length 1
>containing the character U+03C0. Without the "-gnatW8" option, GNAT
>interprets it as a string of Characters to convert to a Wide_String,
>i.e., the two character U+00CF and U+0080
That seems right to me. (In a new compiler, I'd make UTF-8 the default, but
any existing compiler probably would have to make it a switch of some sort.)
But that's because you have a UTF-8 character in the source code.
The bug is that you said that some source code with no explicit UTF-8
characters (rather representing them as Character'Val(16#C0#) and the like)
was changing behavior in response to such a switch. That's a bug in my
view(Character'Val(16#C0#) isn't a character literal at compile-time, it's a
function call, and it's representation is the same regardless of whether the
source is read as 7-bit ASCII or UTF-8).
>Is the constant string value ambiguous here?
It means something different depending upon the source representation. I
belive GNAT is getting that correct.
"Character'Val(16#C0#)" means the same thing in either source
representation, so you should get the same results for the program
containing that. If you don't, that's a bug.
Hope this clears it up.
Randy.
.
next prev parent reply other threads:[~2016-03-25 19:15 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-24 17:23 UTF-8 Output and "-gnatW8" Michael Rohan
2016-03-24 22:09 ` Randy Brukardt
2016-03-24 22:34 ` Michael Rohan
2016-03-25 19:15 ` Randy Brukardt [this message]
2016-03-25 5:54 ` rieachus
2016-03-25 19:18 ` Randy Brukardt
2016-03-28 22:48 ` Michael Rohan
2016-03-29 7:44 ` Dmitry A. Kazakov
2016-03-29 8:39 ` G.B.
2016-03-29 22:35 ` Randy Brukardt
2016-04-04 10:52 ` G.B.
2016-04-05 0:39 ` Randy Brukardt
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox