comp.lang.ada
 help / color / mirror / Atom feed
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: windows-1251 to utf-8
Date: Wed, 31 Oct 2018 11:01:47 +0100
Date: 2018-10-31T11:01:47+01:00	[thread overview]
Message-ID: <prbuib$t4g$1@gioia.aioe.org> (raw)
In-Reply-To: 74537c7a-18dd-421a-b3c2-6919285006cd@googlegroups.com

On 2018-10-31 03:57, eduardsapotski@gmail.com wrote:
> I get HTML from web-server in windows-1251 encoding.
> How do convert HTML in windows-1251 to utf-8?

The encoding table is this:

    https://en.wikipedia.org/wiki/Windows-1251

The 7-bit codes correspond to UTF-8 directly. For 8-bit codes (for all 
codes actually) you take the number from the table, e.g. Cyrillic 
capital Ц -> 16#0426# and convert it to UTF-8 sequence using, for 
example this:

    http://www.dmitry-kazakov.de/ada/strings_edit.htm#7

The function Strings_Edit.UTF8.Image takes code point and returns UTF-8 
equivalent, so

    Strings_Edit.UTF8.Image (16#0426#)

gives Ц in UTF-8.

HTML is an unrelated story. Do you mean RFC 2396 escape sequences? This 
is an alternative representation that has nothing to do with Windows-1251.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

  parent reply	other threads:[~2018-10-31 10:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-31  2:57 windows-1251 to utf-8 eduardsapotski
2018-10-31  6:09 ` gautier_niouzes
2018-10-31 10:01 ` Dmitry A. Kazakov [this message]
2018-10-31 15:28 ` eduardsapotski
2018-10-31 16:50   ` Shark8
2018-10-31 17:01   ` Dmitry A. Kazakov
2018-10-31 20:58     ` Randy Brukardt
2018-11-01 12:49   ` Björn Lundin
2018-11-01 13:26     ` Dmitry A. Kazakov
2018-11-01 14:34       ` Björn Lundin
2018-11-01 18:14 ` Vadim Godunko
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox