From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: "G.B." <bauhaus@notmyhomepage.invalid>
Newsgroups: comp.lang.ada
Subject: Re: unicode and wide_text_io
Date: Thu, 28 Dec 2017 23:35:58 +0100
Organization: A noiseless patient Spider
Message-ID: <p23rk7$5dv$1@dont-email.me>
References: <ccd8e071-c228-4518-967e-09011cd5e291@googlegroups.com>
 <892d5b9a-6460-419a-a09d-d00a4b84c668@googlegroups.com>
 <p22ute$cc4$1@gioia.aioe.org> <lyfu7uexeh.fsf@pushface.org>
 <fakgt5F8tjeU1@mid.individual.net>
 <bec4557c-9fe1-43a0-a0d5-aadede7cab39@googlegroups.com>
Reply-To: nonlegitur@notmyhomepage.de
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 28 Dec 2017 22:35:51 -0000 (UTC)
Injection-Info: reader02.eternal-september.org;
 posting-host="cc3ee168e2a8f8768fdc02de8cd45ecc";
	logging-data="5567"; mail-complaints-to="abuse@eternal-september.org";
	posting-account="U2FsdGVkX18eaX4s4F8eybCsLlKToW6aqeosOcFmv8M="
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0)
 Gecko/20100101 Thunderbird/52.5.0
In-Reply-To: <bec4557c-9fe1-43a0-a0d5-aadede7cab39@googlegroups.com>
Content-Language: en-US
Cancel-Lock: sha1:fkxB+DzmCHwCyhVT2m33u63TCUE=
Xref: reader02.eternal-september.org comp.lang.ada:49682
Date: 2017-12-28T23:35:58+01:00
List-Id: <comp.lang.ada>

On 28.12.17 16:47, 00120260b@gmail.com wrote:
> Then, how come the norm hasn't made it a bit easier to input/ouput post-latin-1 characters ? Why aren't other norms/characters set/encodings more like special cases ?
> 

Actually, output of non-7-bit, unambiguously encoded text
has been made reasonably easy, I'd say, also defaulting
to what should be expected:

with Ada.Wide_Text_IO.Text_Streams;
with Ada.Strings.UTF_Encoding.Wide_Strings;

procedure UTF is
    --  USD/EUR, i.e. "$/€"
    Ratio : constant Wide_String := "$/" & Wide_Character'Val (16#20AC#);

    use Ada.Wide_Text_Io, Ada.Strings;
begin
    Put_Line (Ratio); --  use defaults, traditional
    String'Write --  stream output, force UTF-8
      (Text_Streams.Stream (Current_Output),
       UTF_Encoding.Wide_Strings.Encode (Ratio));
end UTF;

The above source text uses only 7 bit encoding for post-
latin-1 strings. Only comment text is using a wide_character.

If, instead, source text is encoded by "more" bits, and using
post-latin-1 literals or identifiers, then the compiler
may need to be told. I think that BOMs may be of use, and
in any case, there are compiler switches or some other
vendor specific vocabulary describing source text.