From: Manuel Collado <m.collado@fi.upm.es>
Subject: Re: Reading "normal" text files with Wide_Text_IO in GNAT
Date: Tue, 05 Dec 2006 00:35:30 +0100
Date: 2006-12-05T00:35:30+01:00 [thread overview]
Message-ID: <4574b0c2@news.upm.es> (raw)
In-Reply-To: <1165256255.486012.132810@l12g2000cwl.googlegroups.com>
Adam Beneschan escribi�:
> Bj�rn Persson wrote:
>> Adam Beneschan wrote:
>>
>>> However, at first glance, I didn't see a way to get Wide_Text_IO to
>>> read a UCS-1 text file.
>> Hmm, I've never heard of UCS-1. Is such an encoding really defined?
>
> I don't know if that's the correct name. I have seen it referenced in
> a few places.
To clarify things:
- Character set - mapping of characters to integers (the so called
'codepoints')
- Character encoding - mapping of a sequence of codepoints to a sequence of
bytes
UCS-1 means encoding each character (codepoint) as a single byte whose
numerical value is just the codepoint. Can be used only for codepoints in
the range (0..255). UCS-1 is the natural, implicit encoding of all 8-bit
(and 7-bits) character sets.
>
>>> This is the encoding where each byte in the
>>> range 16#00#..16#FF# represents a character in the range
>>> Wide_Character'Val(16#0000#) .. Wide_Character'Val(16#00FF#), and there
>>> is no way to represent wide characters from 16#0100# to 16#FFFF#.
Yes, this is UCS-1.
>> OK, so it's identical to ISO 8859-1.
>
> Technically, I thought ISO-8859-1 was a mapping from a range of
> integers to a set of characters, rather than a specification of how
> characters are represented in bits in an actual file. I could be
> wrong. The distinction gets blurry at times.
Quite true. Technically, ISO-8859-1 is a character set (not a character
encoding). Usually encoded as UCS-1 (as well as a lot of other character sets).
Regretably, the terms 'character set' and 'character encoding' are used as
synonyms in a lot of places.
Regards.
--
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado
next prev parent reply other threads:[~2006-12-04 23:35 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-30 19:54 Reading "normal" text files with Wide_Text_IO in GNAT Adam Beneschan
2006-12-03 1:22 ` Björn Persson
2006-12-04 18:17 ` Adam Beneschan
2006-12-04 23:35 ` Manuel Collado [this message]
2006-12-06 23:46 ` Björn Persson
2006-12-07 2:02 ` Adam Beneschan
2006-12-09 20:43 ` Björn Persson
2006-12-11 19:49 ` Manuel Collado
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox