From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD, FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail From: =?UTF-8?Q?Bj=c3=b6rn_Lundin?= Newsgroups: comp.lang.ada Subject: Re: unicode and wide_text_io Date: Sun, 31 Dec 2017 00:20:16 +0100 Organization: A noiseless patient Spider Message-ID: References: <023dc29b-dbc5-4fc8-b44f-d748517adec8@googlegroups.com> <19cf4dhtoec32ti6nnnduqrgatdj27phvm@4ax.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Date: Sat, 30 Dec 2017 23:20:00 -0000 (UTC) Injection-Info: reader02.eternal-september.org; posting-host="a26f157d2650e4e7a193e6434048d7a6"; logging-data="16719"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+EZufBd/zVm65+mUnIyIPY" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 In-Reply-To: <19cf4dhtoec32ti6nnnduqrgatdj27phvm@4ax.com> Cancel-Lock: sha1:YPS1bEnLmcKVxU4Bb1NB+DrWQAQ= Xref: reader02.eternal-september.org comp.lang.ada:49706 Date: 2017-12-31T00:20:16+01:00 List-Id: On 2017-12-30 16:33, Dennis Lee Bieber wrote: >> codepoint between >> 1 -> 2**8 -1 = 1 byte > Isn't that 0..2^7... Any byte with the MSB set is a multibyte code (and > number of MSB bits set before a 0 bit indicates how many bytes). > >> 2**8 -> 2**16 -1 = 2 bytes >> 2**16 -> 2**24 -1 = 3 bytes >> 2**24 -> 2**32 -1 = 4 bytes You are probably right, I meant to point out the principle. That UTF-8 can be more that 2 bytes. That it expands as needed up to 4 bytes. -- -- Björn