From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!aioe.org!.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: unicode and wide_text_io Date: Sat, 30 Dec 2017 16:56:24 +0100 Organization: Aioe.org NNTP Server Message-ID: References: <023dc29b-dbc5-4fc8-b44f-d748517adec8@googlegroups.com> <19cf4dhtoec32ti6nnnduqrgatdj27phvm@4ax.com> NNTP-Posting-Host: TliDXSPe+gBSGCqP3SEJ2Q.user.gioia.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: abuse@aioe.org User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 Content-Language: en-US X-Notice: Filtered by postfilter v. 0.8.2 Xref: reader02.eternal-september.org comp.lang.ada:49697 Date: 2017-12-30T16:56:24+01:00 List-Id: On 2017-12-30 16:33, Dennis Lee Bieber wrote: > Isn't that 0..2^7... Any byte with the MSB set is a multibyte code (and > number of MSB bits set before a 0 bit indicates how many bytes). Yes. Furthermore, the subsequent octets have MSB set. The reason for this "waste" is to allow bidirectional scanning of UTF-8 strings. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de