From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!aioe.org!.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: windows-1251 to utf-8 Date: Wed, 31 Oct 2018 11:01:47 +0100 Organization: Aioe.org NNTP Server Message-ID: References: <74537c7a-18dd-421a-b3c2-6919285006cd@googlegroups.com> NNTP-Posting-Host: IzvqdhUtDGKIMCldyDtZ+w.user.gioia.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Complaints-To: abuse@aioe.org User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 X-Notice: Filtered by postfilter v. 0.8.3 Content-Language: en-US Xref: reader02.eternal-september.org comp.lang.ada:54728 Date: 2018-10-31T11:01:47+01:00 List-Id: On 2018-10-31 03:57, eduardsapotski@gmail.com wrote: > I get HTML from web-server in windows-1251 encoding. > How do convert HTML in windows-1251 to utf-8? The encoding table is this: https://en.wikipedia.org/wiki/Windows-1251 The 7-bit codes correspond to UTF-8 directly. For 8-bit codes (for all codes actually) you take the number from the table, e.g. Cyrillic capital Ц -> 16#0426# and convert it to UTF-8 sequence using, for example this: http://www.dmitry-kazakov.de/ada/strings_edit.htm#7 The function Strings_Edit.UTF8.Image takes code point and returns UTF-8 equivalent, so Strings_Edit.UTF8.Image (16#0426#) gives Ц in UTF-8. HTML is an unrelated story. Do you mean RFC 2396 escape sequences? This is an alternative representation that has nothing to do with Windows-1251. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de