From: "Björn Lundin" <b.f.lundin@gmail.com>
Subject: Re: windows-1251 to utf-8
Date: Thu, 1 Nov 2018 13:49:00 +0100
Date: 2018-11-01T13:49:00+01:00 [thread overview]
Message-ID: <pret1g$npm$1@dont-email.me> (raw)
In-Reply-To: <af207249-39ec-4ef4-9df3-1579af7f6209@googlegroups.com>
On 2018-10-31 16:28, eduardsapotski@gmail.com wrote:
> Let's make it easier. For example:
>
> ------------------------------------------------------------------
>
> with Ada.Strings.Unbounded; use Ada.Strings.Unbounded;
> with Ada.Text_IO.Unbounded_IO; use Ada.Text_IO.Unbounded_IO;
>
> with AWS.Client; use AWS.Client;
> with AWS.Messages; use AWS.Messages;
> with AWS.Response; use AWS.Response;
>
> procedure Main is
>
> HTML_Result : Unbounded_String;
> Request_Header_List : Header_List;
>
> begin
>
> Request_Header_List.Add(Name => "User-Agent", Value => "Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0");
>
> HTML_Result := Message_Body(Get(URL => "http://www.sql.ru/", Headers => Request_Header_List));
>
> Put_Line(HTML_Result);
>
> end Main;
>
> ------------------------------------------------------------------
>
> My linux terminal (default UTF-8) show: https://photos.app.goo.gl/EPgwKoiFSuwkJvgSA
>
> If set encoding in terminal Windows-1251 - all is well: https://photos.app.goo.gl/goN5g7uofD8rYLP79
>
> Are there standard ways to solve this problem?
>
In xml/ada there are unicode packages.
something like (with changes for 1251 instead of Latin_1 to be done)
with Unicode.Ces.Utf8, Unicode.Ces.Utf32, Unicode.Ces.Basic_8bit,
Unicode.Ccs.ISO_8859_1;
use Unicode, Unicode.Ccs, Unicode.Ces, Unicode.Ces.Utf8, Unicode.Ces.Utf32;
--some with are likely not needed, code copied from bigger function
function To_Utf_8_From_Latin_1_Little_Endian
(A_Latin_1_Encoded_String : in String)
return String is
-- 32-bit Latin-1 string (normal Ada string with 32-bit characters)
S_32 : Unicode.Ces.Utf32.Utf32_Le_String :=
Unicode.Ces.Basic_8bit.To_Utf32 (A_Latin_1_Encoded_String);
-- UTF-32 string (convert Latin-1 to Unicode characters)
U_32 : Unicode.Ces.Utf32.Utf32_Le_String :=
Unicode.Ces.Utf32.To_Unicode_Le
(S_32,
Cs => Unicode.Ccs.ISO_8859_1.ISO_8859_1_Character_Set);
-- change UTF-32 to UTF-8
An_Utf_8_Encoded_String_Le : Unicode.Ces.Utf8.Utf8_String :=
Unicode.Ces.Utf8.From_Utf32 (U_32);
begin
return An_Utf_8_Encoded_String_Le;
end To_Utf_8_From_Latin_1_Little_Endian;
---------------------------------------------------------------------------------
It's a starting point
--
--
Björn
next prev parent reply other threads:[~2018-11-01 12:49 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-31 2:57 windows-1251 to utf-8 eduardsapotski
2018-10-31 6:09 ` gautier_niouzes
2018-10-31 10:01 ` Dmitry A. Kazakov
2018-10-31 15:28 ` eduardsapotski
2018-10-31 16:50 ` Shark8
2018-10-31 17:01 ` Dmitry A. Kazakov
2018-10-31 20:58 ` Randy Brukardt
2018-11-01 12:49 ` Björn Lundin [this message]
2018-11-01 13:26 ` Dmitry A. Kazakov
2018-11-01 14:34 ` Björn Lundin
2018-11-01 18:14 ` Vadim Godunko
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox