From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=unavailable autolearn_force=no version=3.4.4
X-Received: by 10.66.161.195 with SMTP id xu3mr6466545pab.33.1384686741918;
 Sun, 17 Nov 2013 03:12:21 -0800 (PST)
X-Received: by 10.49.127.177 with SMTP id nh17mr1031qeb.30.1384686741852; Sun,
 17 Nov 2013 03:12:21 -0800 (PST)
Path: 
 border1.nntp.ams.giganews.com!nntp.giganews.com!feeder.erje.net!eu.feeder.erje.net!news.glorb.com!y3no9543871pbx.0!news-out.google.com!9ni33120qaf.0!nntp.google.com!i2no3871243qav.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups: comp.lang.ada
Date: Sun, 17 Nov 2013 03:12:21 -0800 (PST)
In-Reply-To: <z2fwn0g0hlr3$.1bktkfuljfy6b.dlg@40tude.net>
Complaints-To: groups-abuse@google.com
Injection-Info: glegroupsg2000goo.googlegroups.com;
 posting-host=31.183.18.217;
 posting-account=fc1UmgoAAADREbhuD8e4smj7nsEdRFz9
NNTP-Posting-Host: 31.183.18.217
References: <73e0853b-454a-467f-9dc7-84ca5b9c29b2@googlegroups.com>
 <1ghx537y5gbfq.17oazom68d4n6.dlg@40tude.net>
 <5bf1b290-70bc-4240-b27c-120ce6b0b840@googlegroups.com>
 <z2fwn0g0hlr3$.1bktkfuljfy6b.dlg@40tude.net>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <7464679c-6b98-4e23-a337-83b671473553@googlegroups.com>
Subject: Re: strange behaviour of utf-8 files
From: Stoik <staszek.goldstein@gmail.com>
Injection-Date: Sun, 17 Nov 2013 11:12:21 +0000
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Xref: number.nntp.dca.giganews.com comp.lang.ada:183908
Date: 2013-11-17T03:12:21-08:00
List-Id: <comp.lang.ada>

W dniu sobota, 16 listopada 2013 16:57:56 UTC+1 u=C5=BCytkownik Dmitry A. K=
azakov napisa=C5=82:
> On Sat, 16 Nov 2013 07:12:20 -0800 (PST), Stoik wrote:
>=20
>=20
>=20
> > By the way, nothing changes if I use wide_character and wide_string
>=20
> > instead of character and string. Even if character=3Doctet, certainly
>=20
> > wide_character is not an octet!
>=20
>=20
>=20
> String =3D Latin1
>=20
> Wide_String =3D UCS-2
>=20
>=20
>=20
> There is no built-in type for UTF-8, though customary one uses String for
>=20
> it (and Wide_String for UTF-16).
>=20
>=20
>=20
> --=20
>=20
> Regards,
>=20
> Dmitry A. Kazakov
>=20
> http://www.dmitry-kazakov.de

Thanks for your comments. It is obviously a question of having a different =
encoding in the editor and the compiler. I forgot to add the -gnatW8 switch=
 to the compiler (this should be a default, I believe). Nevertheless, there=
 still are some misunderstanding connected with string, wide_string and wid=
e_wide_string. They do not correspond to any encodings, they just correspon=
d to character repertoires of the encodings you mentioned. String to the fi=
rst 256 characters from Unicode (or ISO-10646), wide_string to BMP, and wid=
e_wide_string to the whole Unicode. In particular, wide_string can be encod=
ed internally using any of utf-8,16,32, the programmer does not need to kno=
w anything about it.=20

I do not believe one should avoid using characters from outside ASCII in th=
e source code. I tried it in Python and Java with no problems whatsoever. U=
sing some strange constants instead of usual glyphs for characters outside =
ASCII when using subprograms from ada.(wide_)strings.maps, for example to_m=
apping, would be gruesome.=20

In any case, GNAT is prepared to deal with the problem properly, although t=
he number of steps the user must remember about is a bit too high (setting =
environment variable charset to utf-8, choosing utf-8 in the source editor,=
adding -gnatW8 to the compiler switches and -W8 to pretty printer switches.=
 And the UTF-8 is the only encoding that solves the problem of non-Latin1 c=
haracters at all.

Regards