From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,INVALID_MSGID autolearn=no autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,b3d252ea5c7b37a7 X-Google-Attributes: gid103376,public From: mgk25@cl.cam.ac.uk (Markus Kuhn) Subject: Re: ISO LATIN_1 in Windows 95 ? Date: 1998/10/10 Message-ID: <6vo772$q4k$2@pegasus.csx.cam.ac.uk>#1/1 X-Deja-AN: 399730679 References: <3618A5D5.72C@ddre.dk> <6vde32$jle$1@news.net.uni-c.dk> Organization: U of Cambridge Computer Lab, UK Newsgroups: comp.lang.ada Date: 1998-10-10T00:00:00+00:00 List-Id: Jacob Sparre Andersen writes: |> PS: I think CP 1252 refers to Unicode BMP - a 16 bit character encoding - so |> it is "reasonable" that Windows 95 can't handle it. No, this is wrong. Microsoft Code Page CP1252 refers to what the Windows documentation erroneously calles the "ANSI Character Set". CP1252 is an 8-bit character set, that corresponds to ISO 8859-1 (Latin-1), with the following additional characters in the range 128-160 that is reserved in ANSI/ISO 8859-1 for control codes: 0x80 0x20AC #EURO SIGN 0x81 #UNDEFINED 0x82 0x201A #SINGLE LOW-9 QUOTATION MARK 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK 0x84 0x201E #DOUBLE LOW-9 QUOTATION MARK 0x85 0x2026 #HORIZONTAL ELLIPSIS 0x86 0x2020 #DAGGER 0x87 0x2021 #DOUBLE DAGGER 0x88 0x02C6 #MODIFIER LETTER CIRCUMFLEX ACCENT 0x89 0x2030 #PER MILLE SIGN 0x8A 0x0160 #LATIN CAPITAL LETTER S WITH CARON 0x8B 0x2039 #SINGLE LEFT-POINTING ANGLE QUOTATION MARK 0x8C 0x0152 #LATIN CAPITAL LIGATURE OE 0x8D #UNDEFINED 0x8E 0x017D #LATIN CAPITAL LETTER Z WITH CARON 0x8F #UNDEFINED 0x90 #UNDEFINED 0x91 0x2018 #LEFT SINGLE QUOTATION MARK 0x92 0x2019 #RIGHT SINGLE QUOTATION MARK 0x93 0x201C #LEFT DOUBLE QUOTATION MARK 0x94 0x201D #RIGHT DOUBLE QUOTATION MARK 0x95 0x2022 #BULLET 0x96 0x2013 #EN DASH 0x97 0x2014 #EM DASH 0x98 0x02DC #SMALL TILDE 0x99 0x2122 #TRADE MARK SIGN 0x9A 0x0161 #LATIN SMALL LETTER S WITH CARON 0x9B 0x203A #SINGLE RIGHT-POINTING ANGLE QUOTATION MARK 0x9C 0x0153 #LATIN SMALL LIGATURE OE 0x9D #UNDEFINED 0x9E 0x017E #LATIN SMALL LETTER Z WITH CARON 0x9F 0x0178 #LATIN CAPITAL LETTER Y WITH DIAERESIS The official CP1252 to Unicode mapping can be found on ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT There is no such ANSI standard for CP1252, Microsoft refers just to ANSI/ISO 8859-1 when they say "ANSI characterset" and added the above extentions. The usual sloppy terminology in documentation for the "end user". I hope that GNAT correctly transforms CP1252 to Unicode according to the above table when it reads a Wide_String on a Windows platform. Markus -- Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK email: mkuhn at acm.org, home page: