comp.lang.ada
 help / color / mirror / Atom feed
From: Manuel Collado <m.collado@fi.upm.es>
Subject: Re: Reading "normal" text files with Wide_Text_IO in GNAT
Date: Tue, 05 Dec 2006 00:35:30 +0100
Date: 2006-12-05T00:35:30+01:00	[thread overview]
Message-ID: <4574b0c2@news.upm.es> (raw)
In-Reply-To: <1165256255.486012.132810@l12g2000cwl.googlegroups.com>

Adam Beneschan escribi�:
> Bj�rn Persson wrote:
>> Adam Beneschan wrote:
>>
>>> However, at first glance, I didn't see a way to get Wide_Text_IO to
>>> read a UCS-1 text file.
>> Hmm, I've never heard of UCS-1. Is such an encoding really defined?
> 
> I don't know if that's the correct name.  I have seen it referenced in
> a few places.

To clarify things:
- Character set - mapping of characters to integers (the so called 
'codepoints')
- Character encoding - mapping of a sequence of codepoints to a sequence of 
bytes

UCS-1 means encoding each character (codepoint) as a single byte whose 
numerical value is just the codepoint. Can be used only for codepoints in 
the range (0..255). UCS-1 is the natural, implicit encoding of all 8-bit 
(and 7-bits) character sets.

> 
>>> This is the encoding where each byte in the
>>> range  16#00#..16#FF# represents a character in the range
>>> Wide_Character'Val(16#0000#) .. Wide_Character'Val(16#00FF#), and there
>>> is no way to represent wide characters from 16#0100# to 16#FFFF#.

Yes, this is UCS-1.

>> OK, so it's identical to ISO 8859-1.
> 
> Technically, I thought ISO-8859-1 was a mapping from a range of
> integers to a set of characters, rather than a specification of how
> characters are represented in bits in an actual file.  I could be
> wrong.  The distinction gets blurry at times.

Quite true. Technically, ISO-8859-1 is a character set (not a character 
encoding). Usually encoded as UCS-1 (as well as a lot of other character sets).

Regretably, the terms 'character set' and 'character encoding' are used as 
synonyms in a lot of places.

Regards.
-- 
Manuel Collado - http://lml.ls.fi.upm.es/~mcollado



  reply	other threads:[~2006-12-04 23:35 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-30 19:54 Reading "normal" text files with Wide_Text_IO in GNAT Adam Beneschan
2006-12-03  1:22 ` Björn Persson
2006-12-04 18:17   ` Adam Beneschan
2006-12-04 23:35     ` Manuel Collado [this message]
2006-12-06 23:46       ` Björn Persson
2006-12-07  2:02         ` Adam Beneschan
2006-12-09 20:43           ` Björn Persson
2006-12-11 19:49           ` Manuel Collado
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox