comp.lang.ada
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: windows-1251 to utf-8
  @ 2018-10-31 20:58  5%     ` Randy Brukardt
  0 siblings, 0 replies; 5+ results
From: Randy Brukardt @ 2018-10-31 20:58 UTC (permalink / raw)


>Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message 
>news:prcn4v$d30$1@gioia.aioe.org...
> On 2018-10-31 16:28, eduardsapotski@gmail.com wrote:
>> Let's make it easier. For example:
>>
>> ------------------------------------------------------------------
>>
>> with Ada.Strings.Unbounded;     use Ada.Strings.Unbounded;
>> with Ada.Text_IO.Unbounded_IO;  use Ada.Text_IO.Unbounded_IO;
>>
>> with AWS.Client;            use AWS.Client;
>> with AWS.Messages;          use AWS.Messages;
>> with AWS.Response;          use AWS.Response;
>>
>> procedure Main is
>>
>>     HTML_Result   : Unbounded_String;
>>     Request_Header_List : Header_List;
>>
>> begin
>>
>>     Request_Header_List.Add(Name => "User-Agent", Value => "Mozilla/5.0 
>> (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0");
>>
>>     HTML_Result := Message_Body(Get(URL => "http://www.sql.ru/", Headers 
>> => Request_Header_List));
>>
>>     Put_Line(HTML_Result);
>>
>> end Main;
>>
>> ------------------------------------------------------------------
>>
>> My linux terminal (default UTF-8) show: 
>> https://photos.app.goo.gl/EPgwKoiFSuwkJvgSA
>>
>> If set encoding in terminal Windows-1251 - all is well: 
>> https://photos.app.goo.gl/goN5g7uofD8rYLP79
>>
>> Are there standard ways to solve this problem?
>
> What problem? The page uses the content charset=windows-1251. It is legal.
>
> Your program is illegal as it prints the body using Put_Line. Ada standard 
> requires Character be Latin-1. The only case when your program would be 
> correct is when charset=ISO-8859-1.
>
> You must convert the page body according to the encoding specified by the 
> charset key into a string containing UTF-8 octets and use 
> Streams.Stream_IO to write these octets as-is. The conversion for the case 
> of windows-1251 I described earlier. Create a table Character'Pos 
> 0..255 -> Code_Point and use it for each "character" of HTML_Result.
>
> P.S. GNAT Text_IO ignores Latin-1, but that is between GNAT and the 
> underlying OS.
>
> P.P.S. Technically AWS also ignores Ada standard. But that is an 
> established practice. Since there is no better way.

Right. Probably the easiest way to do this (using just Ada functions) would 
be to:

 (A)  Use Ada.Characters to convert the To_String of the unbounded string to 
a Wide_String, and then store that in a Wide_Unbounded_String (or is that a 
Unbounded_Wide_String?)
 (B) Use Ada.Strings.Wide_Maps to create a character conversion map (the 
conversions were described by another reply);
 (C) Use Ada.Strings.Wide_Unbounded.Translate to apply the mapping from (B) 
to your Wide_Unbounded_String.
(D) Use Ada.Strings.UTF_Encoding.Wide_Strings.Encode to convert 
To_Wide_String to your translated Wide_Unbounded_String, presumably storing 
the result into a Unbounded_String.

You potentially could skip (D) if Wide_Text_IO works when sent to 
Standard_Output (I'd expect that on Windows, no idea on Linux). In that 
case, use Wide_Text_IO.Put to send your result.

In any case, this shows why Unicode exists, and why anything these days that 
uses non-standard encodings is evil. There's really no short-cut to recoding 
such things, and that makes them maddening.

                                  Randy.





^ permalink raw reply	[relevance 5%]

* Re: unicode and wide_text_io
  @ 2017-12-28 22:35  7%           ` G.B.
  0 siblings, 0 replies; 5+ results
From: G.B. @ 2017-12-28 22:35 UTC (permalink / raw)


On 28.12.17 16:47, 00120260b@gmail.com wrote:
> Then, how come the norm hasn't made it a bit easier to input/ouput post-latin-1 characters ? Why aren't other norms/characters set/encodings more like special cases ?
> 

Actually, output of non-7-bit, unambiguously encoded text
has been made reasonably easy, I'd say, also defaulting
to what should be expected:

with Ada.Wide_Text_IO.Text_Streams;
with Ada.Strings.UTF_Encoding.Wide_Strings;

procedure UTF is
    --  USD/EUR, i.e. "$/€"
    Ratio : constant Wide_String := "$/" & Wide_Character'Val (16#20AC#);

    use Ada.Wide_Text_Io, Ada.Strings;
begin
    Put_Line (Ratio); --  use defaults, traditional
    String'Write --  stream output, force UTF-8
      (Text_Streams.Stream (Current_Output),
       UTF_Encoding.Wide_Strings.Encode (Ratio));
end UTF;

The above source text uses only 7 bit encoding for post-
latin-1 strings. Only comment text is using a wide_character.

If, instead, source text is encoded by "more" bits, and using
post-latin-1 literals or identifiers, then the compiler
may need to be told. I think that BOMs may be of use, and
in any case, there are compiler switches or some other
vendor specific vocabulary describing source text.


^ permalink raw reply	[relevance 7%]

* Re: Convert wide_string to string (as the same byte array)
  2012-03-06 15:54  0%   ` Adam Beneschan
@ 2012-03-07  1:04  0%     ` Randy Brukardt
  0 siblings, 0 replies; 5+ results
From: Randy Brukardt @ 2012-03-07  1:04 UTC (permalink / raw)


"Adam Beneschan" <adam@irvine.com> wrote in message 
news:5368448.8.1331049289886.JavaMail.geo-discussion-forums@pbbpr1...
> On Monday, March 5, 2012 5:58:48 PM UTC-8, Randy Brukardt wrote:
>>
>> An alternative to Adam's solution would be to use the Ada2012 encoding
>> functions (A.4.11), specifically Ada.Strings.UTF_Encoding.Wide_Strings, 
>> and
>> use a UTF-8 encoding. That would be shorter, but not fixed length, so
>> whether that would work for you depends on the API you are feeding these
>> into.
>
> This may seem like a dumb question, but does that preserve order?

My understanding was that UTF-8 was designed so that ordinary byte 
comparison operations would work "properly" on UTF-8 strings (presuming no 
"overlong encodings" are used; there is no point in such things, it's like 
including NOPs in your generated instructions). That's surely true if only 
equality is involved; I believe it is also true for ordering, but as I've 
never tried it I don't want to say for absolutely certain.

                                            Randy.





^ permalink raw reply	[relevance 0%]

* Re: Convert wide_string to string (as the same byte array)
  2012-03-06  1:58  5% ` Randy Brukardt
@ 2012-03-06 15:54  0%   ` Adam Beneschan
  2012-03-07  1:04  0%     ` Randy Brukardt
  0 siblings, 1 reply; 5+ results
From: Adam Beneschan @ 2012-03-06 15:54 UTC (permalink / raw)


On Monday, March 5, 2012 5:58:48 PM UTC-8, Randy Brukardt wrote:
> 
> An alternative to Adam's solution would be to use the Ada2012 encoding 
> functions (A.4.11), specifically Ada.Strings.UTF_Encoding.Wide_Strings, and 
> use a UTF-8 encoding. That would be shorter, but not fixed length, so 
> whether that would work for you depends on the API you are feeding these 
> into.

This may seem like a dumb question, but does that preserve order?

                        -- Adam



^ permalink raw reply	[relevance 0%]

* Re: Convert wide_string to string (as the same byte array)
  @ 2012-03-06  1:58  5% ` Randy Brukardt
  2012-03-06 15:54  0%   ` Adam Beneschan
  0 siblings, 1 reply; 5+ results
From: Randy Brukardt @ 2012-03-06  1:58 UTC (permalink / raw)


"Erich" <john@peppermind.com> wrote in message 
news:f88cc8ca-183a-40c7-a01c-2adc1137d845@b18g2000vbz.googlegroups.com...
>A newbie question: I need to convert a wide_string to a (platform/
> endian independent) string that represents all the bytes of the
> wide_string. How do you do that?

An alternative to Adam's solution would be to use the Ada2012 encoding 
functions (A.4.11), specifically Ada.Strings.UTF_Encoding.Wide_Strings, and 
use a UTF-8 encoding. That would be shorter, but not fixed length, so 
whether that would work for you depends on the API you are feeding these 
into.

                                           Randy.





^ permalink raw reply	[relevance 5%]

Results 1-5 of 5 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2012-02-24 22:01     Convert wide_string to string (as the same byte array) Erich
2012-03-06  1:58  5% ` Randy Brukardt
2012-03-06 15:54  0%   ` Adam Beneschan
2012-03-07  1:04  0%     ` Randy Brukardt
2017-12-27 18:08     unicode and wide_text_io Mehdi Saada
2017-12-28 13:15     ` Mehdi Saada
2017-12-28 14:25       ` Dmitry A. Kazakov
2017-12-28 14:32         ` Simon Wright
2017-12-28 15:28           ` Niklas Holsti
2017-12-28 15:47             ` 00120260b
2017-12-28 22:35  7%           ` G.B.
2018-10-31  2:57     windows-1251 to utf-8 eduardsapotski
2018-10-31 15:28     ` eduardsapotski
2018-10-31 17:01       ` Dmitry A. Kazakov
2018-10-31 20:58  5%     ` Randy Brukardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox