From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,FROM_SUSPICIOUS_NTLD
	autolearn=no autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!news.gegeweb.eu!gegeweb.org!usenet-fr.net!proxad.net!feeder1-2.proxad.net!cleanfeed3-a.proxad.net!nnrp2-2.free.fr!not-for-mail
Subject: Re: win32 interfacing check (SetClipboardData)
Newsgroups: comp.lang.ada
References: <59a5ce50$0$7168$426a74cc@news.free.fr>
 <oo6nlo$dcf$1@gioia.aioe.org> <59a706d2$0$3723$426a74cc@news.free.fr>
 <oo7a0f$1d9t$1@gioia.aioe.org> <59a957e3$0$31612$426a74cc@news.free.fr>
 <oobm80$1ln1$1@gioia.aioe.org>
From: Xavier Petit <le@vieux.pro>
Date: Sat, 2 Sep 2017 11:38:55 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.3.0
MIME-Version: 1.0
In-Reply-To: <oobm80$1ln1$1@gioia.aioe.org>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: fr
Content-Transfer-Encoding: 8bit
Message-ID: <59aa7c2f$0$9397$426a74cc@news.free.fr>
Organization: Guest of ProXad - France
NNTP-Posting-Date: 02 Sep 2017 11:38:55 CEST
NNTP-Posting-Host: 78.217.21.11
X-Trace: 1504345135 news-3.free.fr 9397 78.217.21.11:50789
X-Complaints-To: abuse@proxad.net
Xref: news.eternal-september.org comp.lang.ada:47883
Date: 2017-09-02T11:38:55+02:00
List-Id: <comp.lang.ada>

Le 01/09/2017 à 15:10, Dmitry A. Kazakov a écrit :
> On 01/09/2017 14:51, Xavier Petit wrote:
>> Thanks but even with Set_Clipboard (Ada.[Wide_]Wide_Text_IO.Get_Line); 
>> I was getting weird clipboard text without -gnatW8 flag.
> 
> But these are not UTF-8! They are UCS-2 and UCS-4.
Yes but having a look at :
https://gcc.gnu.org/onlinedocs/gnat_ugn/Character-Set-Control.html
https://gcc.gnu.org/onlinedocs/gnat_ugn/Wide_005fCharacter-Encodings.html
https://gcc.gnu.org/onlinedocs/gnat_ugn/Wide_005fWide_005fCharacter-Encodings.html

It appears that without the -gnatW8 flag, "Brackets Coding" is the default :
- “In this encoding, a wide character is represented by the following 
eight character sequence: [...]”
- “In this encoding, a wide wide character is represented by the 
following ten or twelve byte character sequence”

...and with the flag, "UTF-8 Coding" is used : “A wide character is 
represented using UCS Transformation Format 8 (UTF-8)”

I think I'm still missing something because one thing is sure :
Ada.Strings.UTF_Encoding.Wide_Wide_String.Encode 
(Ada.Wide_Wide_Text_IO.Get_Line) doesn't not work without the UTF-8 flag...

 From UTF_Encoding.Wide_Wide_Strings package :
“
The encoding routines take a Wide_Wide_String as input and encode the
result using the specified UTF encoding method.
Encode Wide_Wide_String using UTF-8 encoding
Encode Wide_Wide_String using UTF_16 encoding
”
So it means Get_Line returns a Wide_Wide_String without the USC-4 
encoding ? because Encode doesn't return UTF-(8/16) encoding without the 
flag.

> ([Wide_]Wide_Text_IO should never be used, there is no single case one 
> would need these.)
yeah I'm gonna try not to use the Wide_Wide packages, one thing I liked 
with Wide_Wide_String is the correct 'Length attribute.

> If you have a UTF-8 encoded file (e.g. created using Notepad++, saved 
> without BOM), you should use Ada.Streams.Stream_IO, best in binary mode 
> if you are using GNAT.
> 
> You will have to detect line ends manually, but at least there will be 
> guaranty that the run-time does not mangle anything.
Ok, “binary mode”, do you mean using Stream_Element_Array ?

> If you are using Windows calls with the "W" suffix, then all strings 
> there are already UTF-16 and you don't need to convert anything.
Ok, if I stay with win32 functions, I'll get only UTF-16, if I mess with 
external text sources (like files) or Ada standard, I'll deal with 
others text encoding formats like UTF-8, UCS-2, etc...