From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,5d4095813b818c7d
X-Google-Attributes: gid103376,public
X-Google-Language: ENGLISH,ASCII
Date: Tue, 05 Dec 2006 00:35:30 +0100
From: Manuel Collado <m.collado@fi.upm.es>
User-Agent: Thunderbird 1.5 (Windows/20051201)
MIME-Version: 1.0
Newsgroups: comp.lang.ada
Subject: Re: Reading "normal" text files with Wide_Text_IO in GNAT
References: <1164916470.648544.256710@n67g2000cwd.googlegroups.com>
   <kFpch.25227$E02.10276@newsb.telia.net>
 <1165256255.486012.132810@l12g2000cwl.googlegroups.com>
In-Reply-To: <1165256255.486012.132810@l12g2000cwl.googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
NNTP-Posting-Host: 138.100.242.201
Message-ID: <4574b0c2@news.upm.es>
X-Trace: 5 Dec 2006 00:35:30 +0100, 138.100.242.201
Path: 
 g2news2.google.com!news3.google.com!border1.nntp.dca.giganews.com!nntp.giganews.com!aotearoa.belnet.be!news.belnet.be!news.rediris.es!news.upm.es!138.100.242.201
Xref: g2news2.google.com comp.lang.ada:7803
Date: 2006-12-05T00:35:30+01:00
List-Id: <comp.lang.ada>

Adam Beneschan escribi�:
> Bj�rn Persson wrote:
>> Adam Beneschan wrote:
>>
>>> However, at first glance, I didn't see a way to get Wide_Text_IO to
>>> read a UCS-1 text file.
>> Hmm, I've never heard of UCS-1. Is such an encoding really defined?
> 
> I don't know if that's the correct name.  I have seen it referenced in
> a few places.

To clarify things:
- Character set - mapping of characters to integers (the so called 
'codepoints')
- Character encoding - mapping of a sequence of codepoints to a sequence of 
bytes

UCS-1 means encoding each character (codepoint) as a single byte whose 
numerical value is just the codepoint. Can be used only for codepoints in 
the range (0..255). UCS-1 is the natural, implicit encoding of all 8-bit 
(and 7-bits) character sets.

> 
>>> This is the encoding where each byte in the
>>> range  16#00#..16#FF# represents a character in the range
>>> Wide_Character'Val(16#0000#) .. Wide_Character'Val(16#00FF#), and there
>>> is no way to represent wide characters from 16#0100# to 16#FFFF#.

Yes, this is UCS-1.

>> OK, so it's identical to ISO 8859-1.
> 
> Technically, I thought ISO-8859-1 was a mapping from a range of
> integers to a set of characters, rather than a specification of how
> characters are represented in bits in an actual file.  I could be
> wrong.  The distinction gets blurry at times.

Quite true. Technically, ISO-8859-1 is a character set (not a character 
encoding). Usually encoded as UCS-1 (as well as a lot of other character sets).

Regretably, the terms 'character set' and 'character encoding' are used as 
synonyms in a lot of places.

Regards.
-- 
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado