From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD, FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail From: =?UTF-8?Q?Bj=c3=b6rn_Lundin?= Newsgroups: comp.lang.ada Subject: Re: windows-1251 to utf-8 Date: Thu, 1 Nov 2018 13:49:00 +0100 Organization: A noiseless patient Spider Message-ID: References: <74537c7a-18dd-421a-b3c2-6919285006cd@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Date: Thu, 1 Nov 2018 12:54:09 -0000 (UTC) Injection-Info: reader02.eternal-september.org; posting-host="6d84f5a81a8b0bff71d1a797fb059591"; logging-data="24374"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19WQRcd9MuvZnX3ipVy0mET" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 Cancel-Lock: sha1:RwRvimP4n1PSazrpBmo0UYGzCYs= In-Reply-To: Content-Language: sv-FI Xref: reader02.eternal-september.org comp.lang.ada:54741 Date: 2018-11-01T13:49:00+01:00 List-Id: On 2018-10-31 16:28, eduardsapotski@gmail.com wrote: > Let's make it easier. For example: > > ------------------------------------------------------------------ > > with Ada.Strings.Unbounded; use Ada.Strings.Unbounded; > with Ada.Text_IO.Unbounded_IO; use Ada.Text_IO.Unbounded_IO; > > with AWS.Client; use AWS.Client; > with AWS.Messages; use AWS.Messages; > with AWS.Response; use AWS.Response; > > procedure Main is > > HTML_Result : Unbounded_String; > Request_Header_List : Header_List; > > begin > > Request_Header_List.Add(Name => "User-Agent", Value => "Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0"); > > HTML_Result := Message_Body(Get(URL => "http://www.sql.ru/", Headers => Request_Header_List)); > > Put_Line(HTML_Result); > > end Main; > > ------------------------------------------------------------------ > > My linux terminal (default UTF-8) show: https://photos.app.goo.gl/EPgwKoiFSuwkJvgSA > > If set encoding in terminal Windows-1251 - all is well: https://photos.app.goo.gl/goN5g7uofD8rYLP79 > > Are there standard ways to solve this problem? > In xml/ada there are unicode packages. something like (with changes for 1251 instead of Latin_1 to be done) with Unicode.Ces.Utf8, Unicode.Ces.Utf32, Unicode.Ces.Basic_8bit, Unicode.Ccs.ISO_8859_1; use Unicode, Unicode.Ccs, Unicode.Ces, Unicode.Ces.Utf8, Unicode.Ces.Utf32; --some with are likely not needed, code copied from bigger function function To_Utf_8_From_Latin_1_Little_Endian (A_Latin_1_Encoded_String : in String) return String is -- 32-bit Latin-1 string (normal Ada string with 32-bit characters) S_32 : Unicode.Ces.Utf32.Utf32_Le_String := Unicode.Ces.Basic_8bit.To_Utf32 (A_Latin_1_Encoded_String); -- UTF-32 string (convert Latin-1 to Unicode characters) U_32 : Unicode.Ces.Utf32.Utf32_Le_String := Unicode.Ces.Utf32.To_Unicode_Le (S_32, Cs => Unicode.Ccs.ISO_8859_1.ISO_8859_1_Character_Set); -- change UTF-32 to UTF-8 An_Utf_8_Encoded_String_Le : Unicode.Ces.Utf8.Utf8_String := Unicode.Ces.Utf8.From_Utf32 (U_32); begin return An_Utf_8_Encoded_String_Le; end To_Utf_8_From_Latin_1_Little_Endian; --------------------------------------------------------------------------------- It's a starting point -- -- Björn