comp.lang.ada
 help / color / mirror / Atom feed
From: john@peppermind.com
Subject: Re: A few questions on parsing, sockets, UTF-8 strings
Date: Thu, 11 Aug 2016 11:22:55 -0700 (PDT)
Date: 2016-08-11T11:22:55-07:00	[thread overview]
Message-ID: <4c6509a9-5ff2-4f94-b2c3-55d89ca2b076@googlegroups.com> (raw)
In-Reply-To: <noidr9$atk$1@gioia.aioe.org>

On Thursday, August 11, 2016 at 6:49:33 PM UTC+1, Dmitry A. Kazakov wrote:

> ASCII string is an UTF-8 string. The reverse if false.

You're right, Ascii uses only 0...127 as code points. But I thought that Ada fixed strings hold one byte per character, meaning that I can store UTF-8 in it? Am I mistaken about that?

> > So if I Base64 encode this directly, do I have to care about UTF-8?
> 
> No, if it is strictly ASCII. Yes, if you are going to use other Unicode 
> code points.

Sorry for being such a noob, but I still don't get it. If GNAT GPS is set to UTF-8 (-gnatW8 for gnatmake and source encoding in GPS preferences), doesn't that mean that if I enter a Unicode character into a fixed string literal (just String, not Wide_String or Wide_Wide_String) that the string will contain this character in the form of as many bytes as the Unicode code point requires? So if it's a two-byte UTF-8 code point, then the string will contain two bytes?

In that case, as long as I don't need to access single characters ever, could I stick with fixed strings?

  reply	other threads:[~2016-08-11 18:22 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-11 14:39 A few questions on parsing, sockets, UTF-8 strings john
2016-08-11 16:23 ` Dmitry A. Kazakov
2016-08-11 17:40   ` john
2016-08-11 17:49     ` Dmitry A. Kazakov
2016-08-11 18:22       ` john [this message]
2016-08-11 19:09         ` gautier_niouzes
2016-08-11 21:10         ` Dmitry A. Kazakov
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox