From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00,FROM_SUSPICIOUS_NTLD autolearn=no autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!news.gegeweb.eu!gegeweb.org!usenet-fr.net!proxad.net!feeder1-2.proxad.net!cleanfeed3-a.proxad.net!nnrp2-2.free.fr!not-for-mail Subject: Re: win32 interfacing check (SetClipboardData) Newsgroups: comp.lang.ada References: <59a5ce50$0$7168$426a74cc@news.free.fr> <59a706d2$0$3723$426a74cc@news.free.fr> <59a957e3$0$31612$426a74cc@news.free.fr> From: Xavier Petit Date: Sat, 2 Sep 2017 11:38:55 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: fr Content-Transfer-Encoding: 8bit Message-ID: <59aa7c2f$0$9397$426a74cc@news.free.fr> Organization: Guest of ProXad - France NNTP-Posting-Date: 02 Sep 2017 11:38:55 CEST NNTP-Posting-Host: 78.217.21.11 X-Trace: 1504345135 news-3.free.fr 9397 78.217.21.11:50789 X-Complaints-To: abuse@proxad.net Xref: news.eternal-september.org comp.lang.ada:47883 Date: 2017-09-02T11:38:55+02:00 List-Id: Le 01/09/2017 à 15:10, Dmitry A. Kazakov a écrit : > On 01/09/2017 14:51, Xavier Petit wrote: >> Thanks but even with Set_Clipboard (Ada.[Wide_]Wide_Text_IO.Get_Line); >> I was getting weird clipboard text without -gnatW8 flag. > > But these are not UTF-8! They are UCS-2 and UCS-4. Yes but having a look at : https://gcc.gnu.org/onlinedocs/gnat_ugn/Character-Set-Control.html https://gcc.gnu.org/onlinedocs/gnat_ugn/Wide_005fCharacter-Encodings.html https://gcc.gnu.org/onlinedocs/gnat_ugn/Wide_005fWide_005fCharacter-Encodings.html It appears that without the -gnatW8 flag, "Brackets Coding" is the default : - “In this encoding, a wide character is represented by the following eight character sequence: [...]” - “In this encoding, a wide wide character is represented by the following ten or twelve byte character sequence” ...and with the flag, "UTF-8 Coding" is used : “A wide character is represented using UCS Transformation Format 8 (UTF-8)” I think I'm still missing something because one thing is sure : Ada.Strings.UTF_Encoding.Wide_Wide_String.Encode (Ada.Wide_Wide_Text_IO.Get_Line) doesn't not work without the UTF-8 flag... From UTF_Encoding.Wide_Wide_Strings package : “ The encoding routines take a Wide_Wide_String as input and encode the result using the specified UTF encoding method. Encode Wide_Wide_String using UTF-8 encoding Encode Wide_Wide_String using UTF_16 encoding ” So it means Get_Line returns a Wide_Wide_String without the USC-4 encoding ? because Encode doesn't return UTF-(8/16) encoding without the flag. > ([Wide_]Wide_Text_IO should never be used, there is no single case one > would need these.) yeah I'm gonna try not to use the Wide_Wide packages, one thing I liked with Wide_Wide_String is the correct 'Length attribute. > If you have a UTF-8 encoded file (e.g. created using Notepad++, saved > without BOM), you should use Ada.Streams.Stream_IO, best in binary mode > if you are using GNAT. > > You will have to detect line ends manually, but at least there will be > guaranty that the run-time does not mangle anything. Ok, “binary mode”, do you mean using Stream_Element_Array ? > If you are using Windows calls with the "W" suffix, then all strings > there are already UTF-16 and you don't need to convert anything. Ok, if I stay with win32 functions, I'll get only UTF-16, if I mess with external text sources (like files) or Ada standard, I'll deal with others text encoding formats like UTF-8, UCS-2, etc...