From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,5d4095813b818c7d X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII Date: Tue, 05 Dec 2006 00:35:30 +0100 From: Manuel Collado User-Agent: Thunderbird 1.5 (Windows/20051201) MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: Reading "normal" text files with Wide_Text_IO in GNAT References: <1164916470.648544.256710@n67g2000cwd.googlegroups.com> <1165256255.486012.132810@l12g2000cwl.googlegroups.com> In-Reply-To: <1165256255.486012.132810@l12g2000cwl.googlegroups.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit NNTP-Posting-Host: 138.100.242.201 Message-ID: <4574b0c2@news.upm.es> X-Trace: 5 Dec 2006 00:35:30 +0100, 138.100.242.201 Path: g2news2.google.com!news3.google.com!border1.nntp.dca.giganews.com!nntp.giganews.com!aotearoa.belnet.be!news.belnet.be!news.rediris.es!news.upm.es!138.100.242.201 Xref: g2news2.google.com comp.lang.ada:7803 Date: 2006-12-05T00:35:30+01:00 List-Id: Adam Beneschan escribi�: > Bj�rn Persson wrote: >> Adam Beneschan wrote: >> >>> However, at first glance, I didn't see a way to get Wide_Text_IO to >>> read a UCS-1 text file. >> Hmm, I've never heard of UCS-1. Is such an encoding really defined? > > I don't know if that's the correct name. I have seen it referenced in > a few places. To clarify things: - Character set - mapping of characters to integers (the so called 'codepoints') - Character encoding - mapping of a sequence of codepoints to a sequence of bytes UCS-1 means encoding each character (codepoint) as a single byte whose numerical value is just the codepoint. Can be used only for codepoints in the range (0..255). UCS-1 is the natural, implicit encoding of all 8-bit (and 7-bits) character sets. > >>> This is the encoding where each byte in the >>> range 16#00#..16#FF# represents a character in the range >>> Wide_Character'Val(16#0000#) .. Wide_Character'Val(16#00FF#), and there >>> is no way to represent wide characters from 16#0100# to 16#FFFF#. Yes, this is UCS-1. >> OK, so it's identical to ISO 8859-1. > > Technically, I thought ISO-8859-1 was a mapping from a range of > integers to a set of characters, rather than a specification of how > characters are represented in bits in an actual file. I could be > wrong. The distinction gets blurry at times. Quite true. Technically, ISO-8859-1 is a character set (not a character encoding). Usually encoded as UCS-1 (as well as a lot of other character sets). Regretably, the terms 'character set' and 'character encoding' are used as synonyms in a lot of places. Regards. -- Manuel Collado - http://lml.ls.fi.upm.es/~mcollado