From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail From: "G.B." Newsgroups: comp.lang.ada Subject: Re: unicode and wide_text_io Date: Thu, 28 Dec 2017 23:35:58 +0100 Organization: A noiseless patient Spider Message-ID: References: <892d5b9a-6460-419a-a09d-d00a4b84c668@googlegroups.com> Reply-To: nonlegitur@notmyhomepage.de Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Thu, 28 Dec 2017 22:35:51 -0000 (UTC) Injection-Info: reader02.eternal-september.org; posting-host="cc3ee168e2a8f8768fdc02de8cd45ecc"; logging-data="5567"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18eaX4s4F8eybCsLlKToW6aqeosOcFmv8M=" User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 In-Reply-To: Content-Language: en-US Cancel-Lock: sha1:fkxB+DzmCHwCyhVT2m33u63TCUE= Xref: reader02.eternal-september.org comp.lang.ada:49682 Date: 2017-12-28T23:35:58+01:00 List-Id: On 28.12.17 16:47, 00120260b@gmail.com wrote: > Then, how come the norm hasn't made it a bit easier to input/ouput post-latin-1 characters ? Why aren't other norms/characters set/encodings more like special cases ? > Actually, output of non-7-bit, unambiguously encoded text has been made reasonably easy, I'd say, also defaulting to what should be expected: with Ada.Wide_Text_IO.Text_Streams; with Ada.Strings.UTF_Encoding.Wide_Strings; procedure UTF is -- USD/EUR, i.e. "$/€" Ratio : constant Wide_String := "$/" & Wide_Character'Val (16#20AC#); use Ada.Wide_Text_Io, Ada.Strings; begin Put_Line (Ratio); -- use defaults, traditional String'Write -- stream output, force UTF-8 (Text_Streams.Stream (Current_Output), UTF_Encoding.Wide_Strings.Encode (Ratio)); end UTF; The above source text uses only 7 bit encoding for post- latin-1 strings. Only comment text is using a wide_character. If, instead, source text is encoded by "more" bits, and using post-latin-1 literals or identifiers, then the compiler may need to be told. I think that BOMs may be of use, and in any case, there are compiler switches or some other vendor specific vocabulary describing source text.