From: Stoik <staszek.goldstein@gmail.com>
Subject: Re: strange behaviour of utf-8 files
Date: Sun, 17 Nov 2013 03:12:21 -0800 (PST)
Date: 2013-11-17T03:12:21-08:00 [thread overview]
Message-ID: <7464679c-6b98-4e23-a337-83b671473553@googlegroups.com> (raw)
In-Reply-To: <z2fwn0g0hlr3$.1bktkfuljfy6b.dlg@40tude.net>
W dniu sobota, 16 listopada 2013 16:57:56 UTC+1 użytkownik Dmitry A. Kazakov napisał:
> On Sat, 16 Nov 2013 07:12:20 -0800 (PST), Stoik wrote:
>
>
>
> > By the way, nothing changes if I use wide_character and wide_string
>
> > instead of character and string. Even if character=octet, certainly
>
> > wide_character is not an octet!
>
>
>
> String = Latin1
>
> Wide_String = UCS-2
>
>
>
> There is no built-in type for UTF-8, though customary one uses String for
>
> it (and Wide_String for UTF-16).
>
>
>
> --
>
> Regards,
>
> Dmitry A. Kazakov
>
> http://www.dmitry-kazakov.de
Thanks for your comments. It is obviously a question of having a different encoding in the editor and the compiler. I forgot to add the -gnatW8 switch to the compiler (this should be a default, I believe). Nevertheless, there still are some misunderstanding connected with string, wide_string and wide_wide_string. They do not correspond to any encodings, they just correspond to character repertoires of the encodings you mentioned. String to the first 256 characters from Unicode (or ISO-10646), wide_string to BMP, and wide_wide_string to the whole Unicode. In particular, wide_string can be encoded internally using any of utf-8,16,32, the programmer does not need to know anything about it.
I do not believe one should avoid using characters from outside ASCII in the source code. I tried it in Python and Java with no problems whatsoever. Using some strange constants instead of usual glyphs for characters outside ASCII when using subprograms from ada.(wide_)strings.maps, for example to_mapping, would be gruesome.
In any case, GNAT is prepared to deal with the problem properly, although the number of steps the user must remember about is a bit too high (setting environment variable charset to utf-8, choosing utf-8 in the source editor,adding -gnatW8 to the compiler switches and -W8 to pretty printer switches. And the UTF-8 is the only encoding that solves the problem of non-Latin1 characters at all.
Regards
next prev parent reply other threads:[~2013-11-17 11:12 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-16 13:12 strange behaviour of utf-8 files Stoik
2013-11-16 13:34 ` Dmitry A. Kazakov
2013-11-16 15:09 ` Stoik
2013-11-16 15:55 ` Dmitry A. Kazakov
2013-11-17 13:32 ` Georg Bauhaus
2013-11-17 14:07 ` Dmitry A. Kazakov
2013-11-17 17:19 ` Dennis Lee Bieber
2013-11-17 18:07 ` Dmitry A. Kazakov
2013-11-17 19:05 ` Georg Bauhaus
2013-11-17 20:38 ` Dmitry A. Kazakov
2013-11-18 8:38 ` Georg Bauhaus
2013-11-18 9:01 ` Dmitry A. Kazakov
2013-11-18 10:06 ` Georg Bauhaus
2013-11-18 8:44 ` Georg Bauhaus
2013-11-18 10:24 ` Dmitry A. Kazakov
2013-11-18 13:05 ` G.B.
2013-11-18 15:25 ` Dmitry A. Kazakov
2013-11-18 15:51 ` G.B.
2013-11-18 17:34 ` Dmitry A. Kazakov
2013-11-18 0:34 ` Stoik
2013-11-16 17:01 ` Georg Bauhaus
2013-11-17 10:38 ` Stoik
2013-11-16 15:12 ` Stoik
2013-11-16 15:57 ` Dmitry A. Kazakov
2013-11-17 11:12 ` Stoik [this message]
2013-11-22 1:03 ` Randy Brukardt
2013-11-22 3:02 ` Shark8
2013-11-22 11:54 ` Georg Bauhaus
2013-11-23 4:14 ` Randy Brukardt
2013-12-06 2:17 ` Georg Bauhaus
2013-11-16 20:06 ` Peter C. Chapin
2013-11-17 10:34 ` Stoik
2013-11-22 0:53 ` Randy Brukardt
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox