From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD, FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail From: =?UTF-8?Q?Bj=c3=b6rn_Lundin?= Newsgroups: comp.lang.ada Subject: Re: unicode and wide_text_io Date: Sat, 30 Dec 2017 13:50:37 +0100 Organization: A noiseless patient Spider Message-ID: References: <023dc29b-dbc5-4fc8-b44f-d748517adec8@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Date: Sat, 30 Dec 2017 12:50:23 -0000 (UTC) Injection-Info: reader02.eternal-september.org; posting-host="a26f157d2650e4e7a193e6434048d7a6"; logging-data="7633"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX180DcdnHOB3GJyKSNiKskWt" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 In-Reply-To: <023dc29b-dbc5-4fc8-b44f-d748517adec8@googlegroups.com> Cancel-Lock: sha1:QAGranmSSu9ZNTCCrQep//dEBS8= Xref: reader02.eternal-september.org comp.lang.ada:49695 Date: 2017-12-30T13:50:37+01:00 List-Id: On 2017-12-28 23:36, Mehdi Saada wrote: > Myself: >> there are positions such as Wide_Character'Val(X) doesn't correspond to the Xth character in the UNICODE standard ?? > Of course: Character'val(156) to 'val(255) are one byte long, whereas in UTF8 the corresponding code points are encoded with two bytes. Did I understood the lesson ? Yes - if it fits into 2 bytes. if not UTF-8 uses 3 and 4 bytes instead. So UTF-8 can use codepoints up to 32 bits (ca 4 billion) codepoint between 1 -> 2**8 -1 = 1 byte 2**8 -> 2**16 -1 = 2 bytes 2**16 -> 2**24 -1 = 3 bytes 2**24 -> 2**32 -1 = 4 bytes -- -- Björn