From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: windows-1251 to utf-8
Date: Wed, 31 Oct 2018 18:01:21 +0100
Date: 2018-10-31T18:01:21+01:00 [thread overview]
Message-ID: <prcn4v$d30$1@gioia.aioe.org> (raw)
In-Reply-To: af207249-39ec-4ef4-9df3-1579af7f6209@googlegroups.com
On 2018-10-31 16:28, eduardsapotski@gmail.com wrote:
> Let's make it easier. For example:
>
> ------------------------------------------------------------------
>
> with Ada.Strings.Unbounded; use Ada.Strings.Unbounded;
> with Ada.Text_IO.Unbounded_IO; use Ada.Text_IO.Unbounded_IO;
>
> with AWS.Client; use AWS.Client;
> with AWS.Messages; use AWS.Messages;
> with AWS.Response; use AWS.Response;
>
> procedure Main is
>
> HTML_Result : Unbounded_String;
> Request_Header_List : Header_List;
>
> begin
>
> Request_Header_List.Add(Name => "User-Agent", Value => "Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0");
>
> HTML_Result := Message_Body(Get(URL => "http://www.sql.ru/", Headers => Request_Header_List));
>
> Put_Line(HTML_Result);
>
> end Main;
>
> ------------------------------------------------------------------
>
> My linux terminal (default UTF-8) show: https://photos.app.goo.gl/EPgwKoiFSuwkJvgSA
>
> If set encoding in terminal Windows-1251 - all is well: https://photos.app.goo.gl/goN5g7uofD8rYLP79
>
> Are there standard ways to solve this problem?
What problem? The page uses the content charset=windows-1251. It is legal.
Your program is illegal as it prints the body using Put_Line. Ada
standard requires Character be Latin-1. The only case when your program
would be correct is when charset=ISO-8859-1.
You must convert the page body according to the encoding specified by
the charset key into a string containing UTF-8 octets and use
Streams.Stream_IO to write these octets as-is. The conversion for the
case of windows-1251 I described earlier. Create a table Character'Pos
0..255 -> Code_Point and use it for each "character" of HTML_Result.
P.S. GNAT Text_IO ignores Latin-1, but that is between GNAT and the
underlying OS.
P.P.S. Technically AWS also ignores Ada standard. But that is an
established practice. Since there is no better way.
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
next prev parent reply other threads:[~2018-10-31 17:01 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-31 2:57 windows-1251 to utf-8 eduardsapotski
2018-10-31 6:09 ` gautier_niouzes
2018-10-31 10:01 ` Dmitry A. Kazakov
2018-10-31 15:28 ` eduardsapotski
2018-10-31 16:50 ` Shark8
2018-10-31 17:01 ` Dmitry A. Kazakov [this message]
2018-10-31 20:58 ` Randy Brukardt
2018-11-01 12:49 ` Björn Lundin
2018-11-01 13:26 ` Dmitry A. Kazakov
2018-11-01 14:34 ` Björn Lundin
2018-11-01 18:14 ` Vadim Godunko
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox