From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,5d4095813b818c7d
X-Google-Attributes: gid103376,public
X-Google-Language: ENGLISH,ASCII
Path: 
 g2news2.google.com!postnews.google.com!l12g2000cwl.googlegroups.com!not-for-mail
From: "Adam Beneschan" <adam@irvine.com>
Newsgroups: comp.lang.ada
Subject: Re: Reading "normal" text files with Wide_Text_IO in GNAT
Date: 6 Dec 2006 18:02:55 -0800
Organization: http://groups.google.com
Message-ID: <1165456975.595248.177740@l12g2000cwl.googlegroups.com>
References: <1164916470.648544.256710@n67g2000cwd.googlegroups.com>
   <kFpch.25227$E02.10276@newsb.telia.net>
   <1165256255.486012.132810@l12g2000cwl.googlegroups.com>
   <4574b0c2@news.upm.es>
   <lDIdh.25626$E02.10478@newsb.telia.net>
NNTP-Posting-Host: 66.126.103.122
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Trace: posting.google.com 1165456980 31361 127.0.0.1 (7 Dec 2006 02:03:00
 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Thu, 7 Dec 2006 02:03:00 +0000 (UTC)
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
 rv:1.7.12) Gecko/20050922 Fedora/1.7.12-1.3.1,gzip(gfe),gzip(gfe)
Complaints-To: groups-abuse@google.com
Injection-Info: l12g2000cwl.googlegroups.com; posting-host=66.126.103.122;
   posting-account=cw1zeQwAAABOY2vF_g6V_9cdsyY_wV9w
Xref: g2news2.google.com comp.lang.ada:7838
Date: 2006-12-06T18:02:55-08:00
List-Id: <comp.lang.ada>

Bj=F6rn Persson wrote:
> Manuel Collado wrote:
> > UCS-1 means encoding each character (codepoint) as a single byte whose
> > numerical value is just the codepoint. Can be used only for codepoints =
in
> > the range (0..255). UCS-1 is the natural, implicit encoding of all 8-bit
> > (and 7-bits) character sets.
>
> I'd still like to know where UCS-1 is defined, and by whom.
> http://www.iana.org/assignments/character-sets lists ISO-10646-UCS-2,
> ISO-10646-UCS-4 and ISO-10646-UCS-Basic, but no UCS-1.
> http://www.unicode.org/glossary/#U also has entries for UCS-2 and UCS-4,
> but no UCS-1.

UCS-Basic may be the "official" name for what I'm talking about.
Unfortunately, I'm having trouble figuring it out.  The IANA website
you referred me to is titled "Character Sets", but some of the things
listed underneath are encoding standards (UTF-8, etc.) rather than
character sets; UCS-Basic is listed as a "subset of Unicode", however,
and Unicode is a character set (not an encoding; there are multiple
ways to encode Unicode characters, including UTF-8, UTF-16, UCS-2).  So
this page just exemplifies the sort of confusion Manuel referred to.  A
quick Google search hasn't provided any further enlightenment on
exactly what UCS-Basic is.  Specifically, I can't tell whether it's a
character set or an encoding.

UCS-2 and UCS-4 are representations in which if an integer N maps to a
character, then that character is represented simply by a 2- or 4-byte
binary representation of N (byte ordering is an issue, though).  So it
would seem logical that UCS-1 would simply refer to a 1-byte binary
representation of a number.  That's how it seemed to me, and I did find
other references to this term, so I figured it was the correct term.
But maybe it isn't official.

Sigh....

                               -- Adam