comp.lang.ada
 help / color / mirror / Atom feed
* win32 interfacing check (SetClipboardData)
@ 2017-08-29 20:28 Xavier Petit
  2017-08-30 16:04 ` Dmitry A. Kazakov
  0 siblings, 1 reply; 10+ messages in thread
From: Xavier Petit @ 2017-08-29 20:28 UTC (permalink / raw)


Hi, I would like to know if this win32 code is “correct” from your PoV 
or could be written in a better way, especially this block :

declare
    Tmp : Wide_String (1 .. Source'Length + 1) with Address => AMem;
begin
    Tmp := Source & Wide_Character'First;
end;

Complete code : https://pastebin.com/raw/CnUbGVyk

It copies the Source Wide_String in the Windows clipboard and needs 
Win32Ada & -gnatW8 -gnata compilation flags (in order to get correct 
unicode characters and assertions enabled)

Thanks by advance

-- 
Xavier Petit


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: win32 interfacing check (SetClipboardData)
  2017-08-29 20:28 win32 interfacing check (SetClipboardData) Xavier Petit
@ 2017-08-30 16:04 ` Dmitry A. Kazakov
  2017-08-30 18:41   ` Xavier Petit
  2017-08-31  1:41   ` Randy Brukardt
  0 siblings, 2 replies; 10+ messages in thread
From: Dmitry A. Kazakov @ 2017-08-30 16:04 UTC (permalink / raw)


On 29/08/2017 22:28, Xavier Petit wrote:
> Hi, I would like to know if this win32 code is “correct” from your PoV 
> or could be written in a better way, especially this block :
> 
> declare
>     Tmp : Wide_String (1 .. Source'Length + 1) with Address => AMem;
> begin
>     Tmp := Source & Wide_Character'First;
> end;

It looks OK. Except that formally Wide_String is UCS-2 and Windows is 
UTF-16.

I would use UTF-8 encoded string as the input and recode it into UTF-16 
to have CF_UNICODETEXT, e.g. by using MultiByteToWideChar.

> Complete code : https://pastebin.com/raw/CnUbGVyk
> 
> It copies the Source Wide_String in the Windows clipboard and needs 
> Win32Ada & -gnatW8 -gnata compilation flags (in order to get correct 
> unicode characters and assertions enabled)

Why should it need gnatW8 or gnata? You get characters by encoding them, 
I suppose.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: win32 interfacing check (SetClipboardData)
  2017-08-30 16:04 ` Dmitry A. Kazakov
@ 2017-08-30 18:41   ` Xavier Petit
  2017-08-30 21:17     ` Dmitry A. Kazakov
  2017-08-31  1:41   ` Randy Brukardt
  1 sibling, 1 reply; 10+ messages in thread
From: Xavier Petit @ 2017-08-30 18:41 UTC (permalink / raw)


Le 30/08/2017 à 18:04, Dmitry A. Kazakov a écrit :
> It looks OK. Except that formally Wide_String is UCS-2 and Windows is 
> UTF-16.
Thank you for pointing that out.

> I would use UTF-8 encoded string as the input and recode it into UTF-16 
> to have CF_UNICODETEXT, e.g. by using MultiByteToWideChar.
Thank you but I always get the error ERROR_INVALID_PARAMETER from 
GetLastError, using it like this :

UTF16_Code_Page : constant := 1200;

Length := MultiByteToWideChar (CodePage       => UTF16_Code_Page,
                                DwFlags        => MB_PRECOMPOSED,
                                LpMultiByteStr => Addr (Source),
                                CchMultiByte   => -1,
                                LpWideCharStr  => Encoded,
                                CchWideChar    => 0);

> Why should it need gnatW8 or gnata? You get characters by encoding them, 
> I suppose.
I use gnata to trigger Ada.Assertions errors, I could use the pragma 
Assertion_Policy (Check) too.
I have tested the following Wide_String : "123〠" with or without -gnatW8.
It only worked with, but thanks to you I know that my procedure was 
wrong anyway.

If I call this procedure with Source => "𐐷𤭢" and -gnatW8 I get the 
error “literal out of range of type Standard.Wide_Character”.
Without the flag, the code compiles but the clipboard has a weird text.

So I have a working version of the code :
https://pastebin.com/raw/5ss5m5QY
...but using Wide_Wide_String/UCS-2/UTF32 as input *and* -gnatW8 flag.
So not at all like your idea (UTF-8 String and no special flag).

Thank you very much for your help


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: win32 interfacing check (SetClipboardData)
  2017-08-30 18:41   ` Xavier Petit
@ 2017-08-30 21:17     ` Dmitry A. Kazakov
  2017-09-01 12:51       ` Xavier Petit
  0 siblings, 1 reply; 10+ messages in thread
From: Dmitry A. Kazakov @ 2017-08-30 21:17 UTC (permalink / raw)


On 2017-08-30 20:41, Xavier Petit wrote:
> Le 30/08/2017 à 18:04, Dmitry A. Kazakov a écrit :
>> It looks OK. Except that formally Wide_String is UCS-2 and Windows is 
>> UTF-16.
> Thank you for pointing that out.
> 
>> I would use UTF-8 encoded string as the input and recode it into 
>> UTF-16 to have CF_UNICODETEXT, e.g. by using MultiByteToWideChar.
> Thank you but I always get the error ERROR_INVALID_PARAMETER from 
> GetLastError, using it like this :
> 
> UTF16_Code_Page : constant := 1200;
> 
> Length := MultiByteToWideChar (CodePage       => UTF16_Code_Page,
>                                 DwFlags        => MB_PRECOMPOSED,
>                                 LpMultiByteStr => Addr (Source),
>                                 CchMultiByte   => -1,
>                                 LpWideCharStr  => Encoded,
>                                 CchWideChar    => 0);

The output must be null when its length is. And it looks like 
MB_PRECOMPOSED does not work. So (without error handling):

--
-- UTF-8 to UTF-16 conversion using MultiByteToWideChar
--
    function Convert (Text : String) return Wide_String is
       use type Win32.INT;
       Length : Win32.INT;
    begin
       Length := MultiByteToWideChar -- Determine length
                 (  CodePage       => CP_UTF8,
                    DwFlags        => 0,
                    LpMultiByteStr => Addr (Text),
                    CchMultiByte   => -1,
                    LpWideCharStr  => null,
                    CchWideChar    => 0
                 );
       declare
          Result : Wide_String (1..Integer (Length));
       begin
          Length := MultiByteToWideChar -- Do conversion
                    (  CodePage       => CP_UTF8,
                       DwFlags        => 0,
                       LpMultiByteStr => Addr (Text),
                       CchMultiByte   => -1,
                       LpWideCharStr  => Addr (Result),
                       CchWideChar    => Length
                 );
          Put_Line (Win32.INT'Image (Length));
          return Result;
       end;
    end Convert;

And Text must end with Character'Val (0). Of course you can do 
GlobalAlloc for the second call instead of returning Ada string.

>> Why should it need gnatW8 or gnata? You get characters by encoding 
>> them, I suppose.
> I use gnata to trigger Ada.Assertions errors, I could use the pragma 
> Assertion_Policy (Check) too.
> I have tested the following Wide_String : "123〠" with or without -gnatW8.
> It only worked with, but thanks to you I know that my procedure was 
> wrong anyway.

I see. IMO, it is a bad idea to use non-ASCII characters in the source. 
When I need a special character I take its UNICODE code position and 
convert that to String.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: win32 interfacing check (SetClipboardData)
  2017-08-30 16:04 ` Dmitry A. Kazakov
  2017-08-30 18:41   ` Xavier Petit
@ 2017-08-31  1:41   ` Randy Brukardt
  2017-09-01 12:53     ` Xavier Petit
  1 sibling, 1 reply; 10+ messages in thread
From: Randy Brukardt @ 2017-08-31  1:41 UTC (permalink / raw)


> On 29/08/2017 22:28, Xavier Petit wrote:
>> Hi, I would like to know if this win32 code is "correct" from your PoV or 
>> could be written in a better way, especially this block :
>>
>> declare
>>  Tmp : Wide_String (1 .. Source'Length + 1) with Address => AMem;
>> begin
>>  Tmp := Source & Wide_Character'First;
>> end;
>
> It looks OK.

For me, using an address clause for anything other than interfacing to 
hardware is wrong. We certainly didn't do anything like this in Claw when 
implementing the clipboard operations. We used instances of 
Unchecked_Conversion to get a pointer of the right type, and then assigned 
into that. (Nowdays, I might use an instance of 
Address_to_Access_Conversions.)

The code that converts a String parameter into a value in the clipboard list 
looks like:

    procedure Append_Copy(Item : in     String;
                          List : in out Representation_List_Type;
                          Kind : in     Text_Kinds := Text) is
        use type Claw.Win32.HGlobal;
        Mem : Claw.Win32.HGlobal;
        subtype C_String_Type is Interfaces.C.Char_Array(0 .. Item'length);
        type Target_Pointer is access all C_String_Type;
        function Convert is new Ada.Unchecked_Conversion
          (Source => Claw.Win32.HGlobal,
           Target => Target_Pointer);
        N : Interfaces.C.Size_T;
    begin
        Mem := Claw.Low_Level.Miscellaneous.Global_Alloc (
          Claw.Low_Level.Miscellaneous.GMEM_FIXED,
          DWord(C_String_Type'Length*(Interfaces.C.Char'Size/8)));
        if Mem = Claw.Win32.NULL_HGLOBAL then
            Claw.Raise_Windows_Error;
        end if;
        Interfaces.C.To_C(Item, Convert(Mem).all, N);
        Append((Handle => Mem, Format => Text_Kind_Format(Kind),
                Delayed_Renderer => null), List);
    end Append_Copy;

"Append" here adds "Mem" to the Representation_List, giving ownership of the 
handle to the List.

A Wide_String version would work the same way, using the appropriate 
Interfaces.C types.

Note that we were trying for maximum portability (to any sane Ada 95 
compiler for Windows); there's no real need to use the Interfaces.C types on 
GNAT (if that's all you care about).

                              Randy.






^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: win32 interfacing check (SetClipboardData)
  2017-08-30 21:17     ` Dmitry A. Kazakov
@ 2017-09-01 12:51       ` Xavier Petit
  2017-09-01 13:10         ` Dmitry A. Kazakov
  0 siblings, 1 reply; 10+ messages in thread
From: Xavier Petit @ 2017-09-01 12:51 UTC (permalink / raw)


Le 30/08/2017 à 23:17, Dmitry A. Kazakov a écrit :
> On 2017-08-30 20:41, Xavier Petit wrote:
>> Thank you but I always get the error ERROR_INVALID_PARAMETER from 
>> GetLastError, using it like this :
>>
>> UTF16_Code_Page : constant := 1200;
>>
>> Length := MultiByteToWideChar (CodePage       => UTF16_Code_Page,
>>                                 DwFlags        => MB_PRECOMPOSED,
>>                                 LpMultiByteStr => Addr (Source),
>>                                 CchMultiByte   => -1,
>>                                 LpWideCharStr  => Encoded,
>>                                 CchWideChar    => 0);
> 
> The output must be null when its length is. And it looks like 
> MB_PRECOMPOSED does not work. So (without error handling):
and I was using the wrong code page (UTF-16 instead of UTF-8).
On why MB_PRECOMPOSED does not work (from MSDN MultiByteToWideChar) :
“Note  For *UTF-8* or code page 54936 (GB18030, starting with Windows 
Vista), dwFlags must be set to either 0 or MB_ERR_INVALID_CHARS. 
Otherwise, the function fails with ERROR_INVALID_FLAGS.”

> -- 
> -- UTF-8 to UTF-16 conversion using MultiByteToWideChar
> -- 
Perfectly works, thank you very much, I can now get rid of -gnatW8 & 
Wide_Wide_String.
>>> Why should it need gnatW8 or gnata? You get characters by encoding 
>>> them, I suppose.
>> I use gnata to trigger Ada.Assertions errors, I could use the pragma 
>> Assertion_Policy (Check) too.
>> I have tested the following Wide_String : "123〠" with or without 
>> -gnatW8.
>> It only worked with, but thanks to you I know that my procedure was 
>> wrong anyway.
> 
> I see. IMO, it is a bad idea to use non-ASCII characters in the source. 
> When I need a special character I take its UNICODE code position and 
> convert that to String.
Thanks but even with Set_Clipboard (Ada.[Wide_]Wide_Text_IO.Get_Line); I 
was getting weird clipboard text without -gnatW8 flag.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: win32 interfacing check (SetClipboardData)
  2017-08-31  1:41   ` Randy Brukardt
@ 2017-09-01 12:53     ` Xavier Petit
  0 siblings, 0 replies; 10+ messages in thread
From: Xavier Petit @ 2017-09-01 12:53 UTC (permalink / raw)


Le 31/08/2017 à 03:41, Randy Brukardt a écrit :
> For me, using an address clause for anything other than interfacing to
> hardware is wrong. We certainly didn't do anything like this in Claw when
> implementing the clipboard operations. We used instances of
> Unchecked_Conversion to get a pointer of the right type, and then assigned
> into that. (Nowdays, I might use an instance of
> Address_to_Access_Conversions.)
Thank you, this was my first approach (and it worked), but when I tried 
the 'Address trick, I was surprised that it worked too so I removed the 
System.Address_To_Access_Conversions;
Finally I use the To_PWSTR of Win32 (Unchecked_Conversion) which seems 
to me simpler :

AMem := GlobalLock (HMem);
pragma Assert (AMem /= Null_Address);
Length := MultiByteToWideChar (CodePage       => CP_UTF8,
                                DwFlags        => 0,
                                LpMultiByteStr => Addr (Text),
                                CchMultiByte   => -1,
                                LpWideCharStr  => To_PWSTR (Amem),
                                CchWideChar    => Length);
Thanks for the CLAW example


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: win32 interfacing check (SetClipboardData)
  2017-09-01 12:51       ` Xavier Petit
@ 2017-09-01 13:10         ` Dmitry A. Kazakov
  2017-09-02  9:38           ` Xavier Petit
  0 siblings, 1 reply; 10+ messages in thread
From: Dmitry A. Kazakov @ 2017-09-01 13:10 UTC (permalink / raw)


On 01/09/2017 14:51, Xavier Petit wrote:
> Thanks but even with Set_Clipboard (Ada.[Wide_]Wide_Text_IO.Get_Line); I 
> was getting weird clipboard text without -gnatW8 flag.

But these are not UTF-8! They are UCS-2 and UCS-4.

([Wide_]Wide_Text_IO should never be used, there is no single case one 
would need these.)

If you have a UTF-8 encoded file (e.g. created using Notepad++, saved 
without BOM), you should use Ada.Streams.Stream_IO, best in binary mode 
if you are using GNAT.

You will have to detect line ends manually, but at least there will be 
guaranty that the run-time does not mangle anything.

If you are using Windows calls with the "W" suffix, then all strings 
there are already UTF-16 and you don't need to convert anything.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: win32 interfacing check (SetClipboardData)
  2017-09-01 13:10         ` Dmitry A. Kazakov
@ 2017-09-02  9:38           ` Xavier Petit
  2017-09-02 12:29             ` Dmitry A. Kazakov
  0 siblings, 1 reply; 10+ messages in thread
From: Xavier Petit @ 2017-09-02  9:38 UTC (permalink / raw)


Le 01/09/2017 à 15:10, Dmitry A. Kazakov a écrit :
> On 01/09/2017 14:51, Xavier Petit wrote:
>> Thanks but even with Set_Clipboard (Ada.[Wide_]Wide_Text_IO.Get_Line); 
>> I was getting weird clipboard text without -gnatW8 flag.
> 
> But these are not UTF-8! They are UCS-2 and UCS-4.
Yes but having a look at :
https://gcc.gnu.org/onlinedocs/gnat_ugn/Character-Set-Control.html
https://gcc.gnu.org/onlinedocs/gnat_ugn/Wide_005fCharacter-Encodings.html
https://gcc.gnu.org/onlinedocs/gnat_ugn/Wide_005fWide_005fCharacter-Encodings.html

It appears that without the -gnatW8 flag, "Brackets Coding" is the default :
- “In this encoding, a wide character is represented by the following 
eight character sequence: [...]”
- “In this encoding, a wide wide character is represented by the 
following ten or twelve byte character sequence”

...and with the flag, "UTF-8 Coding" is used : “A wide character is 
represented using UCS Transformation Format 8 (UTF-8)”

I think I'm still missing something because one thing is sure :
Ada.Strings.UTF_Encoding.Wide_Wide_String.Encode 
(Ada.Wide_Wide_Text_IO.Get_Line) doesn't not work without the UTF-8 flag...

 From UTF_Encoding.Wide_Wide_Strings package :
“
The encoding routines take a Wide_Wide_String as input and encode the
result using the specified UTF encoding method.
Encode Wide_Wide_String using UTF-8 encoding
Encode Wide_Wide_String using UTF_16 encoding
”
So it means Get_Line returns a Wide_Wide_String without the USC-4 
encoding ? because Encode doesn't return UTF-(8/16) encoding without the 
flag.

> ([Wide_]Wide_Text_IO should never be used, there is no single case one 
> would need these.)
yeah I'm gonna try not to use the Wide_Wide packages, one thing I liked 
with Wide_Wide_String is the correct 'Length attribute.

> If you have a UTF-8 encoded file (e.g. created using Notepad++, saved 
> without BOM), you should use Ada.Streams.Stream_IO, best in binary mode 
> if you are using GNAT.
> 
> You will have to detect line ends manually, but at least there will be 
> guaranty that the run-time does not mangle anything.
Ok, “binary mode”, do you mean using Stream_Element_Array ?

> If you are using Windows calls with the "W" suffix, then all strings 
> there are already UTF-16 and you don't need to convert anything.
Ok, if I stay with win32 functions, I'll get only UTF-16, if I mess with 
external text sources (like files) or Ada standard, I'll deal with 
others text encoding formats like UTF-8, UCS-2, etc...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: win32 interfacing check (SetClipboardData)
  2017-09-02  9:38           ` Xavier Petit
@ 2017-09-02 12:29             ` Dmitry A. Kazakov
  0 siblings, 0 replies; 10+ messages in thread
From: Dmitry A. Kazakov @ 2017-09-02 12:29 UTC (permalink / raw)


On 2017-09-02 11:38, Xavier Petit wrote:
> Le 01/09/2017 à 15:10, Dmitry A. Kazakov a écrit :
>> On 01/09/2017 14:51, Xavier Petit wrote:
>>> Thanks but even with Set_Clipboard 
>>> (Ada.[Wide_]Wide_Text_IO.Get_Line); I was getting weird clipboard 
>>> text without -gnatW8 flag.
>>
>> But these are not UTF-8! They are UCS-2 and UCS-4.
> Yes but having a look at :
> https://gcc.gnu.org/onlinedocs/gnat_ugn/Character-Set-Control.html
> https://gcc.gnu.org/onlinedocs/gnat_ugn/Wide_005fCharacter-Encodings.html
> https://gcc.gnu.org/onlinedocs/gnat_ugn/Wide_005fWide_005fCharacter-Encodings.html 

These about the source code encoding not about run-time I/O.

> So it means Get_Line returns a Wide_Wide_String without the USC-4 
> encoding ? because Encode doesn't return UTF-(8/16) encoding without the 
> flag.

It depends on the input file. Get_Line will work only if the text file 
you are reading from is UCS-4 encoded. Where did you get such files?

AFAIK, GNAT implementation supports the Form parameter in Open, where 
you can specify the file encoding if that is different from the string 
encoding, e.g. UTF-8 when Wide_Wide_Text_IO deals with UCS-4. I suppose 
that should recode the input into the designated string encoding. 
However I never tried this and do not intend to.

>> If you have a UTF-8 encoded file (e.g. created using Notepad++, saved 
>> without BOM), you should use Ada.Streams.Stream_IO, best in binary 
>> mode if you are using GNAT.
>>
>> You will have to detect line ends manually, but at least there will be 
>> guaranty that the run-time does not mangle anything.
> Ok, “binary mode”, do you mean using Stream_Element_Array ?
> 
>> If you are using Windows calls with the "W" suffix, then all strings 
>> there are already UTF-16 and you don't need to convert anything.
> Ok, if I stay with win32 functions, I'll get only UTF-16, if I mess with 
> external text sources (like files) or Ada standard, I'll deal with 
> others text encoding formats like UTF-8, UCS-2, etc...

When dealing with external files I read them using Stream_IO and recode 
manually to UTF-8 if necessary. Luckily in the recent time the supply of 
files encoded in Latin-1, UCS-2 and other increasingly idiotic formats 
is almost depleted. So there is little to worry about...

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-09-02 12:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-29 20:28 win32 interfacing check (SetClipboardData) Xavier Petit
2017-08-30 16:04 ` Dmitry A. Kazakov
2017-08-30 18:41   ` Xavier Petit
2017-08-30 21:17     ` Dmitry A. Kazakov
2017-09-01 12:51       ` Xavier Petit
2017-09-01 13:10         ` Dmitry A. Kazakov
2017-09-02  9:38           ` Xavier Petit
2017-09-02 12:29             ` Dmitry A. Kazakov
2017-08-31  1:41   ` Randy Brukardt
2017-09-01 12:53     ` Xavier Petit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox