From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!news.unit0.net!peer01.am4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!buffer1.nntp.dca1.giganews.com!buffer2.nntp.dca1.giganews.com!nntp.earthlink.com!news.earthlink.com.POSTED!not-for-mail NNTP-Posting-Date: Sat, 30 Dec 2017 09:33:24 -0600 From: Dennis Lee Bieber Newsgroups: comp.lang.ada Subject: Re: unicode and wide_text_io Date: Sat, 30 Dec 2017 10:33:26 -0500 Organization: IISS Elusive Unicorn Message-ID: <19cf4dhtoec32ti6nnnduqrgatdj27phvm@4ax.com> References: <023dc29b-dbc5-4fc8-b44f-d748517adec8@googlegroups.com> User-Agent: ForteAgent/8.00.32.1272 X-No-Archive: YES MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Usenet-Provider: http://www.giganews.com NNTP-Posting-Host: 108.68.178.188 X-Trace: sv3-MLkOIAkyU/Dl9YxfepBrCMVPNBs+OhIr8kqGplvcwgjIjeTFOFqIhVuOH4+KJ6NDoAnHucvUOC/udMZ!zH35VNk9tBY9WSDav0uE5Dog7hDVuF6tFzFLNQlZJpDT3R0tRsDAsKE0bpz9qTYwhOr6cBpvCkA1!ruEnc6uaKgwV/PbxDpee+XmRC9Fk X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly X-Postfilter: 1.3.40 X-Original-Bytes: 2179 X-Received-Body-CRC: 1009475023 X-Received-Bytes: 2422 Xref: reader02.eternal-september.org comp.lang.ada:49696 Date: 2017-12-30T10:33:26-05:00 List-Id: On Sat, 30 Dec 2017 13:50:37 +0100, Björn Lundin declaimed the following: >On 2017-12-28 23:36, Mehdi Saada wrote: >> Myself: >>> there are positions such as Wide_Character'Val(X) doesn't correspond to the Xth character in the UNICODE standard ?? >> Of course: Character'val(156) to 'val(255) are one byte long, whereas in UTF8 the corresponding code points are encoded with two bytes. Did I understood the lesson ? > >Yes - if it fits into 2 bytes. if not UTF-8 uses 3 and 4 bytes instead. >So UTF-8 can use codepoints up to 32 bits (ca 4 billion) > >codepoint between >1 -> 2**8 -1 = 1 byte Isn't that 0..2^7... Any byte with the MSB set is a multibyte code (and number of MSB bits set before a 0 bit indicates how many bytes). >2**8 -> 2**16 -1 = 2 bytes >2**16 -> 2**24 -1 = 3 bytes >2**24 -> 2**32 -1 = 4 bytes > >-- -- Wulfraed Dennis Lee Bieber AF6VN wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/