comp.lang.ada
 help / color / mirror / Atom feed
From: rieachus@comcast.net
Subject: Re: UTF-8 Output and "-gnatW8"
Date: Thu, 24 Mar 2016 22:54:31 -0700 (PDT)
Date: 2016-03-24T22:54:31-07:00	[thread overview]
Message-ID: <3a65e71c-41ee-49eb-916d-c0be8be9abc6@googlegroups.com> (raw)
In-Reply-To: <35689862-61dc-4186-87d3-37b17abed5a2@googlegroups.com>

On Thursday, March 24, 2016, Michael Rohan wrote:
> The implementation for GNAT impacts the handling of strings, e.g., 
>
> S : constant Wide_String := "π"; 

> With "-gnatW8" this is correctly interpreted as a string of length 1 
> containing the character U+03C0.  Without the "-gnatW8" option, GNAT 
> interprets it as a string of Characters to convert to a Wide_String, 
> i.e., the two character U+00CF and U+0080 

No, it is complex to explain and understand, but once you "get it" you shouldn't have any further problems.

There are (at least) three character encoding in any Ada program, usually more.  It is nice if they can all match, but that can be problematic.  The first is the encoding used in source files.  This obviously can't be chosen by a pragma, and GNAT uses -gnatW8 to force UTF-8.  Since the Ada standard doesn't say anything about the operating system instructions used to call the compiler, this is fine.  Note that the printer character set used for program listings can be different, same with debugger settings, and the character set used by your terminal.

Second are the character set(s) when an Ada program reads from or writes to files or other devices.  The Ada standard defines that binding for Text_IO, Wide_Text_IO, etc.  Notice that this is about character sets, not character encodings.  You could have a Form string "UTF8" for files, with a Wide_Text_IO version that understood it.  Same for "Unicode" and so on.  The program would only see ISO-10646 characters, but the generated files would be much smaller. ;-)

Finally there is the representation of (ISO-10646) characters in source files.  Yes, if the source file uses UTF8 you are fine.  Well not really.  The terminal that you use to create the file may not have the characters you need to use, or in the case of control characters, they may have a different meaning to the compiler.  The standard includes rules for such encodings, and there are (standard) packages which need to use them.

Anyway, you get the standard defined behavior when using -gnatW8, so there is no bug.  The "Ï€" or whatever shows up is probably governed by your terminal's character set. I don't see a listing with Wide_Character(16#CF80#) but you would need an output program that supports it.  (The nice thing about electronic displays is they can support dozens of different character encodings.

  parent reply	other threads:[~2016-03-25  5:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-24 17:23 UTF-8 Output and "-gnatW8" Michael Rohan
2016-03-24 22:09 ` Randy Brukardt
2016-03-24 22:34   ` Michael Rohan
2016-03-25 19:15     ` Randy Brukardt
2016-03-25  5:54 ` rieachus [this message]
2016-03-25 19:18   ` Randy Brukardt
2016-03-28 22:48     ` Michael Rohan
2016-03-29  7:44       ` Dmitry A. Kazakov
2016-03-29  8:39       ` G.B.
2016-03-29 22:35       ` Randy Brukardt
2016-04-04 10:52         ` G.B.
2016-04-05  0:39           ` Randy Brukardt
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox