From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00
	autolearn=unavailable autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!mx02.eternal-september.org!feeder.eternal-september.org!news.albasani.net!reality.xs3.de!news.jacob-sparre.dk!loke.jacob-sparre.dk!pnx.dk!.POSTED!not-for-mail
From: "Randy Brukardt" <randy@rrsoftware.com>
Newsgroups: comp.lang.ada
Subject: Re: UTF-8 Output and "-gnatW8"
Date: Fri, 25 Mar 2016 14:15:00 -0500
Organization: JSA Research & Innovation
Message-ID: <nd42nl$fj5$1@loke.gir.dk>
References: <35689862-61dc-4186-87d3-37b17abed5a2@googlegroups.com>
 <nd1oir$c8r$1@loke.gir.dk>
 <4f157cc0-1d3c-46d7-ab19-88a13ae0afd0@googlegroups.com>
NNTP-Posting-Host: rrsoftware.com
X-Trace: loke.gir.dk 1458933301 15973 24.196.82.226 (25 Mar 2016 19:15:01 GMT)
X-Complaints-To: news@jacob-sparre.dk
NNTP-Posting-Date: Fri, 25 Mar 2016 19:15:01 +0000 (UTC)
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.5931
X-RFC2646: Format=Flowed; Original
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157
Xref: news.eternal-september.org comp.lang.ada:29885
Date: 2016-03-25T14:15:00-05:00
List-Id: <comp.lang.ada>

"Michael Rohan" <michael@zanyblue.com> wrote in message 
news:4f157cc0-1d3c-46d7-ab19-88a13ae0afd0@googlegroups.com...
>Hi,
>
>OK, so this might be a compiler bug.  The RM states the character set 
>should
>be ISO 10646 so EBCDIC would seem to be something that is not allowed.

Ah, that's a common mistake. The RM specifies what the *runtime* character 
set it. Prior to Ada 2012, it said *nothing* about the encoding of Ada 
source code, and even now, it only talks about UTF-8 as one possibility for 
that encoding. Anything else is allowed, including EBCDIC, Shift-JIS, or 
even some sort of tree (the latter is explicitly mentioned as a possibility 
in the AARM). Someone even suggested a source representation where '{' = 
"begin", "} = "end", etc. (It was that suggestion that finally got the UTF-8 
"standard" encoding into the Standard, to provide real interoperability for 
Ada source code.)

>The implementation for GNAT impacts the handling of strings, e.g.,
>
>S : constant Wide_String := "?";
>
>With "-gnatW8" this is correctly interpreted as a string of length 1
>containing the character U+03C0.  Without the "-gnatW8" option, GNAT
>interprets it as a string of Characters to convert to a Wide_String,
>i.e., the two character U+00CF and U+0080

That seems right to me. (In a new compiler, I'd make UTF-8 the default, but 
any existing compiler probably would have to make it a switch of some sort.) 
But that's because you have a UTF-8 character in the source code.

The bug is that you said that some source code with no explicit UTF-8 
characters (rather representing them as Character'Val(16#C0#) and the like) 
was changing behavior in response to such a switch. That's a bug in my 
view(Character'Val(16#C0#) isn't a character literal at compile-time, it's a 
function call, and it's representation is the same regardless of whether the 
source is read as 7-bit ASCII or UTF-8).

>Is the constant string value ambiguous here?

It means something different depending upon the source representation. I 
belive GNAT is getting that correct.

"Character'Val(16#C0#)" means the same thing in either source 
representation, so you should get the same results for the program 
containing that. If you don't, that's a bug.

Hope this clears it up.

                                    Randy.

.