From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: windows-1251 to utf-8
Date: Wed, 31 Oct 2018 11:01:47 +0100
Date: 2018-10-31T11:01:47+01:00 [thread overview]
Message-ID: <prbuib$t4g$1@gioia.aioe.org> (raw)
In-Reply-To: 74537c7a-18dd-421a-b3c2-6919285006cd@googlegroups.com
On 2018-10-31 03:57, eduardsapotski@gmail.com wrote:
> I get HTML from web-server in windows-1251 encoding.
> How do convert HTML in windows-1251 to utf-8?
The encoding table is this:
https://en.wikipedia.org/wiki/Windows-1251
The 7-bit codes correspond to UTF-8 directly. For 8-bit codes (for all
codes actually) you take the number from the table, e.g. Cyrillic
capital Ц -> 16#0426# and convert it to UTF-8 sequence using, for
example this:
http://www.dmitry-kazakov.de/ada/strings_edit.htm#7
The function Strings_Edit.UTF8.Image takes code point and returns UTF-8
equivalent, so
Strings_Edit.UTF8.Image (16#0426#)
gives Ц in UTF-8.
HTML is an unrelated story. Do you mean RFC 2396 escape sequences? This
is an alternative representation that has nothing to do with Windows-1251.
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
next prev parent reply other threads:[~2018-10-31 10:01 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-31 2:57 windows-1251 to utf-8 eduardsapotski
2018-10-31 6:09 ` gautier_niouzes
2018-10-31 10:01 ` Dmitry A. Kazakov [this message]
2018-10-31 15:28 ` eduardsapotski
2018-10-31 16:50 ` Shark8
2018-10-31 17:01 ` Dmitry A. Kazakov
2018-10-31 20:58 ` Randy Brukardt
2018-11-01 12:49 ` Björn Lundin
2018-11-01 13:26 ` Dmitry A. Kazakov
2018-11-01 14:34 ` Björn Lundin
2018-11-01 18:14 ` Vadim Godunko
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox