From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=BAYES_00,FREEMAIL_FROM, FROM_STARTS_WITH_NUMS autolearn=no autolearn_force=no version=3.4.4 X-Received: by 10.107.3.155 with SMTP id e27mr23698880ioi.22.1514500570957; Thu, 28 Dec 2017 14:36:10 -0800 (PST) X-Received: by 10.157.19.100 with SMTP id q33mr828245otq.13.1514500570770; Thu, 28 Dec 2017 14:36:10 -0800 (PST) Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!border1.nntp.ams1.giganews.com!nntp.giganews.com!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.am4!peer.am4.highwinds-media.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!i6no3480965itb.0!news-out.google.com!s63ni6289itb.0!nntp.google.com!g80no3473867itg.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Thu, 28 Dec 2017 14:36:10 -0800 (PST) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=82.154.190.39; posting-account=rhqvKAoAAABpikMmPHJSZh4400BboHwT NNTP-Posting-Host: 82.154.190.39 References: User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <023dc29b-dbc5-4fc8-b44f-d748517adec8@googlegroups.com> Subject: Re: unicode and wide_text_io From: Mehdi Saada <00120260a@gmail.com> Injection-Date: Thu, 28 Dec 2017 22:36:10 +0000 Content-Type: text/plain; charset="UTF-8" X-Received-Bytes: 1936 X-Received-Body-CRC: 3651245000 Xref: reader02.eternal-september.org comp.lang.ada:49683 Date: 2017-12-28T14:36:10-08:00 List-Id: I took some time to read here and there on the topics of encoding, character sets, unicode, what is UTF8,16 and 32, little and big endian, BOM, etc. Now I've done that, your comments Dmitry sounds accurate, and it turned out I really knew nothing about [ban]characters[/ban]/glyphs/code points. Wasn't so complicated in the end. I'll look at your work in no time. Since I long to work in the area of interface and commandline utilities, the sooner I learn all about characters, the better. Thanks for your explanation, you guys ;-) Myself: > there are positions such as Wide_Character'Val(X) doesn't correspond to the Xth character in the UNICODE standard ?? Of course: Character'val(156) to 'val(255) are one byte long, whereas in UTF8 the corresponding code points are encoded with two bytes. Did I understood the lesson ?