From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!aioe.org!.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: win32 interfacing check (SetClipboardData) Date: Sat, 2 Sep 2017 14:29:35 +0200 Organization: Aioe.org NNTP Server Message-ID: References: <59a5ce50$0$7168$426a74cc@news.free.fr> <59a706d2$0$3723$426a74cc@news.free.fr> <59a957e3$0$31612$426a74cc@news.free.fr> <59aa7c2f$0$9397$426a74cc@news.free.fr> NNTP-Posting-Host: MajGvm9MbNtGBKE7r8NgYA.user.gioia.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Complaints-To: abuse@aioe.org User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 X-Notice: Filtered by postfilter v. 0.8.2 Content-Language: en-US Xref: news.eternal-september.org comp.lang.ada:47887 Date: 2017-09-02T14:29:35+02:00 List-Id: On 2017-09-02 11:38, Xavier Petit wrote: > Le 01/09/2017 à 15:10, Dmitry A. Kazakov a écrit : >> On 01/09/2017 14:51, Xavier Petit wrote: >>> Thanks but even with Set_Clipboard >>> (Ada.[Wide_]Wide_Text_IO.Get_Line); I was getting weird clipboard >>> text without -gnatW8 flag. >> >> But these are not UTF-8! They are UCS-2 and UCS-4. > Yes but having a look at : > https://gcc.gnu.org/onlinedocs/gnat_ugn/Character-Set-Control.html > https://gcc.gnu.org/onlinedocs/gnat_ugn/Wide_005fCharacter-Encodings.html > https://gcc.gnu.org/onlinedocs/gnat_ugn/Wide_005fWide_005fCharacter-Encodings.html These about the source code encoding not about run-time I/O. > So it means Get_Line returns a Wide_Wide_String without the USC-4 > encoding ? because Encode doesn't return UTF-(8/16) encoding without the > flag. It depends on the input file. Get_Line will work only if the text file you are reading from is UCS-4 encoded. Where did you get such files? AFAIK, GNAT implementation supports the Form parameter in Open, where you can specify the file encoding if that is different from the string encoding, e.g. UTF-8 when Wide_Wide_Text_IO deals with UCS-4. I suppose that should recode the input into the designated string encoding. However I never tried this and do not intend to. >> If you have a UTF-8 encoded file (e.g. created using Notepad++, saved >> without BOM), you should use Ada.Streams.Stream_IO, best in binary >> mode if you are using GNAT. >> >> You will have to detect line ends manually, but at least there will be >> guaranty that the run-time does not mangle anything. > Ok, “binary mode”, do you mean using Stream_Element_Array ? > >> If you are using Windows calls with the "W" suffix, then all strings >> there are already UTF-16 and you don't need to convert anything. > Ok, if I stay with win32 functions, I'll get only UTF-16, if I mess with > external text sources (like files) or Ada standard, I'll deal with > others text encoding formats like UTF-8, UCS-2, etc... When dealing with external files I read them using Stream_IO and recode manually to UTF-8 if necessary. Luckily in the recent time the supply of files encoded in Latin-1, UCS-2 and other increasingly idiotic formats is almost depleted. So there is little to worry about... -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de