From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00
	autolearn=unavailable autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!news.unit0.net!peer01.am4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!buffer1.nntp.dca1.giganews.com!buffer2.nntp.dca1.giganews.com!nntp.earthlink.com!news.earthlink.com.POSTED!not-for-mail
NNTP-Posting-Date: Sat, 30 Dec 2017 09:33:24 -0600
From: Dennis Lee Bieber <wlfraed@ix.netcom.com>
Newsgroups: comp.lang.ada
Subject: Re: unicode and wide_text_io
Date: Sat, 30 Dec 2017 10:33:26 -0500
Organization: IISS Elusive Unicorn
Message-ID: <19cf4dhtoec32ti6nnnduqrgatdj27phvm@4ax.com>
References: <ccd8e071-c228-4518-967e-09011cd5e291@googlegroups.com>
 <023dc29b-dbc5-4fc8-b44f-d748517adec8@googlegroups.com>
 <p2822e$7eh$1@dont-email.me>
User-Agent: ForteAgent/8.00.32.1272
X-No-Archive: YES
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Usenet-Provider: http://www.giganews.com
NNTP-Posting-Host: 108.68.178.188
X-Trace: 
 sv3-MLkOIAkyU/Dl9YxfepBrCMVPNBs+OhIr8kqGplvcwgjIjeTFOFqIhVuOH4+KJ6NDoAnHucvUOC/udMZ!zH35VNk9tBY9WSDav0uE5Dog7hDVuF6tFzFLNQlZJpDT3R0tRsDAsKE0bpz9qTYwhOr6cBpvCkA1!ruEnc6uaKgwV/PbxDpee+XmRC9Fk
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint
 properly
X-Postfilter: 1.3.40
X-Original-Bytes: 2179
X-Received-Body-CRC: 1009475023
X-Received-Bytes: 2422
Xref: reader02.eternal-september.org comp.lang.ada:49696
Date: 2017-12-30T10:33:26-05:00
List-Id: <comp.lang.ada>

On Sat, 30 Dec 2017 13:50:37 +0100, Björn Lundin <b.f.lundin@gmail.com>
declaimed the following:

>On 2017-12-28 23:36, Mehdi Saada wrote:
>> Myself:
>>> there are positions such as Wide_Character'Val(X) doesn't correspond to the Xth character in the UNICODE standard ??
>> Of course: Character'val(156) to 'val(255) are one byte long, whereas in UTF8 the corresponding code points are encoded with two bytes. Did I understood the lesson ?
>
>Yes - if it fits into 2 bytes. if not UTF-8 uses 3 and 4 bytes instead.
>So UTF-8 can use codepoints up to 32 bits (ca 4 billion)
>
>codepoint between
>1     -> 2**8  -1 = 1 byte

	Isn't that 0..2^7... Any byte with the MSB set is a multibyte code (and
number of MSB bits set before a 0 bit indicates how many bytes).

>2**8  -> 2**16 -1 = 2 bytes
>2**16 -> 2**24 -1 = 3 bytes
>2**24 -> 2**32 -1 = 4 bytes
>
>-- 
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/