From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 10.157.6.225 with SMTP id 88mr3418149otx.7.1470939775628; Thu, 11 Aug 2016 11:22:55 -0700 (PDT) X-Received: by 10.157.2.39 with SMTP id 36mr158820otb.3.1470939775581; Thu, 11 Aug 2016 11:22:55 -0700 (PDT) Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!usenet.blueworldhosting.com!feeder01.blueworldhosting.com!border2.nntp.dca1.giganews.com!nntp.giganews.com!f6no9207811ith.0!news-out.google.com!d130ni32046ith.0!nntp.google.com!f6no9207805ith.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Thu, 11 Aug 2016 11:22:55 -0700 (PDT) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=2001:8a0:6a4f:fe01:b44b:9abb:7567:a522; posting-account=nd46uAkAAAB2IU3eJoKQE6q_ACEyvPP_ NNTP-Posting-Host: 2001:8a0:6a4f:fe01:b44b:9abb:7567:a522 References: <267bd80f-b388-4df6-b712-315ee9bda2b8@googlegroups.com> <90caee48-5fa7-47d7-aad5-761e11225e2c@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <4c6509a9-5ff2-4f94-b2c3-55d89ca2b076@googlegroups.com> Subject: Re: A few questions on parsing, sockets, UTF-8 strings From: john@peppermind.com Injection-Date: Thu, 11 Aug 2016 18:22:55 +0000 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Xref: news.eternal-september.org comp.lang.ada:31401 Date: 2016-08-11T11:22:55-07:00 List-Id: On Thursday, August 11, 2016 at 6:49:33 PM UTC+1, Dmitry A. Kazakov wrote: > ASCII string is an UTF-8 string. The reverse if false. You're right, Ascii uses only 0...127 as code points. But I thought that Ad= a fixed strings hold one byte per character, meaning that I can store UTF-8= in it? Am I mistaken about that? > > So if I Base64 encode this directly, do I have to care about UTF-8? >=20 > No, if it is strictly ASCII. Yes, if you are going to use other Unicode= =20 > code points. Sorry for being such a noob, but I still don't get it. If GNAT GPS is set t= o UTF-8 (-gnatW8 for gnatmake and source encoding in GPS preferences), does= n't that mean that if I enter a Unicode character into a fixed string liter= al (just String, not Wide_String or Wide_Wide_String) that the string will = contain this character in the form of as many bytes as the Unicode code poi= nt requires? So if it's a two-byte UTF-8 code point, then the string will c= ontain two bytes? In that case, as long as I don't need to access single characters ever, cou= ld I stick with fixed strings?