comp.lang.ada
 help / color / mirror / Atom feed
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: win32 interfacing check (SetClipboardData)
Date: Sat, 2 Sep 2017 14:29:35 +0200
Date: 2017-09-02T14:29:35+02:00	[thread overview]
Message-ID: <ooe87f$1fo0$1@gioia.aioe.org> (raw)
In-Reply-To: 59aa7c2f$0$9397$426a74cc@news.free.fr

On 2017-09-02 11:38, Xavier Petit wrote:
> Le 01/09/2017 à 15:10, Dmitry A. Kazakov a écrit :
>> On 01/09/2017 14:51, Xavier Petit wrote:
>>> Thanks but even with Set_Clipboard 
>>> (Ada.[Wide_]Wide_Text_IO.Get_Line); I was getting weird clipboard 
>>> text without -gnatW8 flag.
>>
>> But these are not UTF-8! They are UCS-2 and UCS-4.
> Yes but having a look at :
> https://gcc.gnu.org/onlinedocs/gnat_ugn/Character-Set-Control.html
> https://gcc.gnu.org/onlinedocs/gnat_ugn/Wide_005fCharacter-Encodings.html
> https://gcc.gnu.org/onlinedocs/gnat_ugn/Wide_005fWide_005fCharacter-Encodings.html 

These about the source code encoding not about run-time I/O.

> So it means Get_Line returns a Wide_Wide_String without the USC-4 
> encoding ? because Encode doesn't return UTF-(8/16) encoding without the 
> flag.

It depends on the input file. Get_Line will work only if the text file 
you are reading from is UCS-4 encoded. Where did you get such files?

AFAIK, GNAT implementation supports the Form parameter in Open, where 
you can specify the file encoding if that is different from the string 
encoding, e.g. UTF-8 when Wide_Wide_Text_IO deals with UCS-4. I suppose 
that should recode the input into the designated string encoding. 
However I never tried this and do not intend to.

>> If you have a UTF-8 encoded file (e.g. created using Notepad++, saved 
>> without BOM), you should use Ada.Streams.Stream_IO, best in binary 
>> mode if you are using GNAT.
>>
>> You will have to detect line ends manually, but at least there will be 
>> guaranty that the run-time does not mangle anything.
> Ok, “binary mode”, do you mean using Stream_Element_Array ?
> 
>> If you are using Windows calls with the "W" suffix, then all strings 
>> there are already UTF-16 and you don't need to convert anything.
> Ok, if I stay with win32 functions, I'll get only UTF-16, if I mess with 
> external text sources (like files) or Ada standard, I'll deal with 
> others text encoding formats like UTF-8, UCS-2, etc...

When dealing with external files I read them using Stream_IO and recode 
manually to UTF-8 if necessary. Luckily in the recent time the supply of 
files encoded in Latin-1, UCS-2 and other increasingly idiotic formats 
is almost depleted. So there is little to worry about...

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de


  reply	other threads:[~2017-09-02 12:29 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-29 20:28 win32 interfacing check (SetClipboardData) Xavier Petit
2017-08-30 16:04 ` Dmitry A. Kazakov
2017-08-30 18:41   ` Xavier Petit
2017-08-30 21:17     ` Dmitry A. Kazakov
2017-09-01 12:51       ` Xavier Petit
2017-09-01 13:10         ` Dmitry A. Kazakov
2017-09-02  9:38           ` Xavier Petit
2017-09-02 12:29             ` Dmitry A. Kazakov [this message]
2017-08-31  1:41   ` Randy Brukardt
2017-09-01 12:53     ` Xavier Petit
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox