comp.lang.ada
 help / color / mirror / Atom feed
From: "Björn Lundin" <b.f.lundin@gmail.com>
Subject: Re: windows-1251 to utf-8
Date: Thu, 1 Nov 2018 13:49:00 +0100
Date: 2018-11-01T13:49:00+01:00	[thread overview]
Message-ID: <pret1g$npm$1@dont-email.me> (raw)
In-Reply-To: <af207249-39ec-4ef4-9df3-1579af7f6209@googlegroups.com>

On 2018-10-31 16:28, eduardsapotski@gmail.com wrote:
> Let's make it easier. For example:
> 
> ------------------------------------------------------------------
> 
> with Ada.Strings.Unbounded;     use Ada.Strings.Unbounded;
> with Ada.Text_IO.Unbounded_IO;  use Ada.Text_IO.Unbounded_IO;
> 
> with AWS.Client;            use AWS.Client;
> with AWS.Messages;          use AWS.Messages;
> with AWS.Response;          use AWS.Response;
> 
> procedure Main is
> 
>    HTML_Result   : Unbounded_String;
>    Request_Header_List : Header_List;
> 
> begin
> 
>    Request_Header_List.Add(Name => "User-Agent", Value => "Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0");
> 
>    HTML_Result := Message_Body(Get(URL => "http://www.sql.ru/", Headers => Request_Header_List));
> 
>    Put_Line(HTML_Result);
> 
> end Main;
> 
> ------------------------------------------------------------------
> 
> My linux terminal (default UTF-8) show: https://photos.app.goo.gl/EPgwKoiFSuwkJvgSA
> 
> If set encoding in terminal Windows-1251 - all is well: https://photos.app.goo.gl/goN5g7uofD8rYLP79
> 
> Are there standard ways to solve this problem?
> 


In xml/ada there are unicode packages.

something like (with changes for 1251 instead of Latin_1 to be done)

with Unicode.Ces.Utf8, Unicode.Ces.Utf32, Unicode.Ces.Basic_8bit,
Unicode.Ccs.ISO_8859_1;
use Unicode, Unicode.Ccs, Unicode.Ces, Unicode.Ces.Utf8, Unicode.Ces.Utf32;

--some with are likely not needed, code copied from bigger function


 function To_Utf_8_From_Latin_1_Little_Endian
     (A_Latin_1_Encoded_String : in String)
      return String is

    --  32-bit Latin-1 string (normal Ada string with 32-bit characters)
    S_32 : Unicode.Ces.Utf32.Utf32_Le_String :=
       Unicode.Ces.Basic_8bit.To_Utf32 (A_Latin_1_Encoded_String);

    --  UTF-32 string (convert Latin-1 to Unicode characters)
    U_32 : Unicode.Ces.Utf32.Utf32_Le_String :=
       Unicode.Ces.Utf32.To_Unicode_Le
          (S_32,
           Cs => Unicode.Ccs.ISO_8859_1.ISO_8859_1_Character_Set);
    -- change UTF-32 to UTF-8
    An_Utf_8_Encoded_String_Le : Unicode.Ces.Utf8.Utf8_String :=
Unicode.Ces.Utf8.From_Utf32 (U_32);

  begin
    return An_Utf_8_Encoded_String_Le;
  end To_Utf_8_From_Latin_1_Little_Endian;

---------------------------------------------------------------------------------


It's a starting point

-- 
--
Björn

  parent reply	other threads:[~2018-11-01 12:49 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-31  2:57 windows-1251 to utf-8 eduardsapotski
2018-10-31  6:09 ` gautier_niouzes
2018-10-31 10:01 ` Dmitry A. Kazakov
2018-10-31 15:28 ` eduardsapotski
2018-10-31 16:50   ` Shark8
2018-10-31 17:01   ` Dmitry A. Kazakov
2018-10-31 20:58     ` Randy Brukardt
2018-11-01 12:49   ` Björn Lundin [this message]
2018-11-01 13:26     ` Dmitry A. Kazakov
2018-11-01 14:34       ` Björn Lundin
2018-11-01 18:14 ` Vadim Godunko
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox