comp.lang.ada
 help / color / mirror / Atom feed
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: windows-1251 to utf-8
Date: Wed, 31 Oct 2018 18:01:21 +0100
Date: 2018-10-31T18:01:21+01:00	[thread overview]
Message-ID: <prcn4v$d30$1@gioia.aioe.org> (raw)
In-Reply-To: af207249-39ec-4ef4-9df3-1579af7f6209@googlegroups.com

On 2018-10-31 16:28, eduardsapotski@gmail.com wrote:
> Let's make it easier. For example:
> 
> ------------------------------------------------------------------
> 
> with Ada.Strings.Unbounded;     use Ada.Strings.Unbounded;
> with Ada.Text_IO.Unbounded_IO;  use Ada.Text_IO.Unbounded_IO;
> 
> with AWS.Client;            use AWS.Client;
> with AWS.Messages;          use AWS.Messages;
> with AWS.Response;          use AWS.Response;
> 
> procedure Main is
> 
>     HTML_Result   : Unbounded_String;
>     Request_Header_List : Header_List;
> 
> begin
> 
>     Request_Header_List.Add(Name => "User-Agent", Value => "Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0");
> 
>     HTML_Result := Message_Body(Get(URL => "http://www.sql.ru/", Headers => Request_Header_List));
> 
>     Put_Line(HTML_Result);
> 
> end Main;
> 
> ------------------------------------------------------------------
> 
> My linux terminal (default UTF-8) show: https://photos.app.goo.gl/EPgwKoiFSuwkJvgSA
> 
> If set encoding in terminal Windows-1251 - all is well: https://photos.app.goo.gl/goN5g7uofD8rYLP79
> 
> Are there standard ways to solve this problem?

What problem? The page uses the content charset=windows-1251. It is legal.

Your program is illegal as it prints the body using Put_Line. Ada 
standard requires Character be Latin-1. The only case when your program 
would be correct is when charset=ISO-8859-1.

You must convert the page body according to the encoding specified by 
the charset key into a string containing UTF-8 octets and use 
Streams.Stream_IO to write these octets as-is. The conversion for the 
case of windows-1251 I described earlier. Create a table Character'Pos 
0..255 -> Code_Point and use it for each "character" of HTML_Result.

P.S. GNAT Text_IO ignores Latin-1, but that is between GNAT and the 
underlying OS.

P.P.S. Technically AWS also ignores Ada standard. But that is an 
established practice. Since there is no better way.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

  parent reply	other threads:[~2018-10-31 17:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-31  2:57 windows-1251 to utf-8 eduardsapotski
2018-10-31  6:09 ` gautier_niouzes
2018-10-31 10:01 ` Dmitry A. Kazakov
2018-10-31 15:28 ` eduardsapotski
2018-10-31 16:50   ` Shark8
2018-10-31 17:01   ` Dmitry A. Kazakov [this message]
2018-10-31 20:58     ` Randy Brukardt
2018-11-01 12:49   ` Björn Lundin
2018-11-01 13:26     ` Dmitry A. Kazakov
2018-11-01 14:34       ` Björn Lundin
2018-11-01 18:14 ` Vadim Godunko
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox