From: Xavier Petit <le@vieux.pro>
Subject: Re: win32 interfacing check (SetClipboardData)
Date: Sat, 2 Sep 2017 11:38:55 +0200
Date: 2017-09-02T11:38:55+02:00 [thread overview]
Message-ID: <59aa7c2f$0$9397$426a74cc@news.free.fr> (raw)
In-Reply-To: <oobm80$1ln1$1@gioia.aioe.org>
Le 01/09/2017 à 15:10, Dmitry A. Kazakov a écrit :
> On 01/09/2017 14:51, Xavier Petit wrote:
>> Thanks but even with Set_Clipboard (Ada.[Wide_]Wide_Text_IO.Get_Line);
>> I was getting weird clipboard text without -gnatW8 flag.
>
> But these are not UTF-8! They are UCS-2 and UCS-4.
Yes but having a look at :
https://gcc.gnu.org/onlinedocs/gnat_ugn/Character-Set-Control.html
https://gcc.gnu.org/onlinedocs/gnat_ugn/Wide_005fCharacter-Encodings.html
https://gcc.gnu.org/onlinedocs/gnat_ugn/Wide_005fWide_005fCharacter-Encodings.html
It appears that without the -gnatW8 flag, "Brackets Coding" is the default :
- “In this encoding, a wide character is represented by the following
eight character sequence: [...]”
- “In this encoding, a wide wide character is represented by the
following ten or twelve byte character sequence”
...and with the flag, "UTF-8 Coding" is used : “A wide character is
represented using UCS Transformation Format 8 (UTF-8)”
I think I'm still missing something because one thing is sure :
Ada.Strings.UTF_Encoding.Wide_Wide_String.Encode
(Ada.Wide_Wide_Text_IO.Get_Line) doesn't not work without the UTF-8 flag...
From UTF_Encoding.Wide_Wide_Strings package :
“
The encoding routines take a Wide_Wide_String as input and encode the
result using the specified UTF encoding method.
Encode Wide_Wide_String using UTF-8 encoding
Encode Wide_Wide_String using UTF_16 encoding
”
So it means Get_Line returns a Wide_Wide_String without the USC-4
encoding ? because Encode doesn't return UTF-(8/16) encoding without the
flag.
> ([Wide_]Wide_Text_IO should never be used, there is no single case one
> would need these.)
yeah I'm gonna try not to use the Wide_Wide packages, one thing I liked
with Wide_Wide_String is the correct 'Length attribute.
> If you have a UTF-8 encoded file (e.g. created using Notepad++, saved
> without BOM), you should use Ada.Streams.Stream_IO, best in binary mode
> if you are using GNAT.
>
> You will have to detect line ends manually, but at least there will be
> guaranty that the run-time does not mangle anything.
Ok, “binary mode”, do you mean using Stream_Element_Array ?
> If you are using Windows calls with the "W" suffix, then all strings
> there are already UTF-16 and you don't need to convert anything.
Ok, if I stay with win32 functions, I'll get only UTF-16, if I mess with
external text sources (like files) or Ada standard, I'll deal with
others text encoding formats like UTF-8, UCS-2, etc...
next prev parent reply other threads:[~2017-09-02 9:38 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-29 20:28 win32 interfacing check (SetClipboardData) Xavier Petit
2017-08-30 16:04 ` Dmitry A. Kazakov
2017-08-30 18:41 ` Xavier Petit
2017-08-30 21:17 ` Dmitry A. Kazakov
2017-09-01 12:51 ` Xavier Petit
2017-09-01 13:10 ` Dmitry A. Kazakov
2017-09-02 9:38 ` Xavier Petit [this message]
2017-09-02 12:29 ` Dmitry A. Kazakov
2017-08-31 1:41 ` Randy Brukardt
2017-09-01 12:53 ` Xavier Petit
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox