From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD, FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail From: =?UTF-8?Q?Bj=c3=b6rn_Lundin?= Newsgroups: comp.lang.ada Subject: Re: windows-1251 to utf-8 Date: Thu, 1 Nov 2018 15:34:28 +0100 Organization: A noiseless patient Spider Message-ID: References: <74537c7a-18dd-421a-b3c2-6919285006cd@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Date: Thu, 1 Nov 2018 14:39:35 -0000 (UTC) Injection-Info: reader02.eternal-september.org; posting-host="6d84f5a81a8b0bff71d1a797fb059591"; logging-data="30726"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Fd57Xd/9zWdwhvQrNUu/c" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 Cancel-Lock: sha1:TdnjlSluBIYun0aysfJMAHtXdp4= In-Reply-To: Content-Language: sv-FI Xref: reader02.eternal-september.org comp.lang.ada:54744 Date: 2018-11-01T15:34:28+01:00 List-Id: On 2018-11-01 14:26, Dmitry A. Kazakov wrote: > On 2018-11-01 13:49, Björn Lundin wrote: > >> something like (with changes for 1251 instead of Latin_1 to be done) > > You probably mean 1252 which almost Latin-1. I do. > 1251 is totally different. > it has Cyrillic letters in the upper half of 8-bit codes, in the place > where 1252 keeps Central European letters with fancy diacritic marks. And I also found that the code in last post can be replaced by ------------------------------------------------------- function To_Iso_Latin_15(Str : Unicode.CES.Byte_Sequence) return String is use Unicode.Encodings; begin return Convert(Str => Str, From => Get_By_Name("utf-8"), To => Get_By_Name("iso-8859-15")); end To_Iso_Latin_15; ------------------------------------------------------- I also see that the unicode package in xml/ada has support for 1251 and 1252. package Unicode.CCS.Windows_1251 is ... the withs are with Ada.Exceptions; use Ada.Exceptions; with Unicode.Names.Cyrillic; use Unicode.Names.Cyrillic; with Unicode.Names.Basic_Latin; use Unicode.Names.Basic_Latin; with Unicode.Names.Latin_1_Supplement; use Unicode.Names.Latin_1_Supplement; with Unicode.Names.Currency_Symbols; use Unicode.Names.Currency_Symbols; with Unicode.Names.General_Punctuation; use Unicode.Names.General_Punctuation; with Unicode.Names.Letterlike_Symbols; use Unicode.Names.Letterlike_Symbols; which suggests to me that it is the cyrillic one which (I think) would make the function above ------------------------------------------------------- function To_Windows_1251(Str : Unicode.CES.Byte_Sequence) return String is use Unicode.Encodings; begin return Convert(Str => Str, From => Get_By_Name("utf-8"), To => Get_By_Name("Windows-1251")); end To_Windows_1251; ------------------------------------------------------- -- -- Björn