comp.lang.ada
 help / color / mirror / Atom feed
From: Xavier Petit <le@vieux.pro>
Subject: Re: win32 interfacing check (SetClipboardData)
Date: Sat, 2 Sep 2017 11:38:55 +0200
Date: 2017-09-02T11:38:55+02:00	[thread overview]
Message-ID: <59aa7c2f$0$9397$426a74cc@news.free.fr> (raw)
In-Reply-To: <oobm80$1ln1$1@gioia.aioe.org>

Le 01/09/2017 à 15:10, Dmitry A. Kazakov a écrit :
> On 01/09/2017 14:51, Xavier Petit wrote:
>> Thanks but even with Set_Clipboard (Ada.[Wide_]Wide_Text_IO.Get_Line); 
>> I was getting weird clipboard text without -gnatW8 flag.
> 
> But these are not UTF-8! They are UCS-2 and UCS-4.
Yes but having a look at :
https://gcc.gnu.org/onlinedocs/gnat_ugn/Character-Set-Control.html
https://gcc.gnu.org/onlinedocs/gnat_ugn/Wide_005fCharacter-Encodings.html
https://gcc.gnu.org/onlinedocs/gnat_ugn/Wide_005fWide_005fCharacter-Encodings.html

It appears that without the -gnatW8 flag, "Brackets Coding" is the default :
- “In this encoding, a wide character is represented by the following 
eight character sequence: [...]”
- “In this encoding, a wide wide character is represented by the 
following ten or twelve byte character sequence”

...and with the flag, "UTF-8 Coding" is used : “A wide character is 
represented using UCS Transformation Format 8 (UTF-8)”

I think I'm still missing something because one thing is sure :
Ada.Strings.UTF_Encoding.Wide_Wide_String.Encode 
(Ada.Wide_Wide_Text_IO.Get_Line) doesn't not work without the UTF-8 flag...

 From UTF_Encoding.Wide_Wide_Strings package :
“
The encoding routines take a Wide_Wide_String as input and encode the
result using the specified UTF encoding method.
Encode Wide_Wide_String using UTF-8 encoding
Encode Wide_Wide_String using UTF_16 encoding
”
So it means Get_Line returns a Wide_Wide_String without the USC-4 
encoding ? because Encode doesn't return UTF-(8/16) encoding without the 
flag.

> ([Wide_]Wide_Text_IO should never be used, there is no single case one 
> would need these.)
yeah I'm gonna try not to use the Wide_Wide packages, one thing I liked 
with Wide_Wide_String is the correct 'Length attribute.

> If you have a UTF-8 encoded file (e.g. created using Notepad++, saved 
> without BOM), you should use Ada.Streams.Stream_IO, best in binary mode 
> if you are using GNAT.
> 
> You will have to detect line ends manually, but at least there will be 
> guaranty that the run-time does not mangle anything.
Ok, “binary mode”, do you mean using Stream_Element_Array ?

> If you are using Windows calls with the "W" suffix, then all strings 
> there are already UTF-16 and you don't need to convert anything.
Ok, if I stay with win32 functions, I'll get only UTF-16, if I mess with 
external text sources (like files) or Ada standard, I'll deal with 
others text encoding formats like UTF-8, UCS-2, etc...

  reply	other threads:[~2017-09-02  9:38 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-29 20:28 win32 interfacing check (SetClipboardData) Xavier Petit
2017-08-30 16:04 ` Dmitry A. Kazakov
2017-08-30 18:41   ` Xavier Petit
2017-08-30 21:17     ` Dmitry A. Kazakov
2017-09-01 12:51       ` Xavier Petit
2017-09-01 13:10         ` Dmitry A. Kazakov
2017-09-02  9:38           ` Xavier Petit [this message]
2017-09-02 12:29             ` Dmitry A. Kazakov
2017-08-31  1:41   ` Randy Brukardt
2017-09-01 12:53     ` Xavier Petit
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox