From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,5d4095813b818c7d X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII Path: g2news2.google.com!postnews.google.com!l12g2000cwl.googlegroups.com!not-for-mail From: "Adam Beneschan" Newsgroups: comp.lang.ada Subject: Re: Reading "normal" text files with Wide_Text_IO in GNAT Date: 6 Dec 2006 18:02:55 -0800 Organization: http://groups.google.com Message-ID: <1165456975.595248.177740@l12g2000cwl.googlegroups.com> References: <1164916470.648544.256710@n67g2000cwd.googlegroups.com> <1165256255.486012.132810@l12g2000cwl.googlegroups.com> <4574b0c2@news.upm.es> NNTP-Posting-Host: 66.126.103.122 Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Trace: posting.google.com 1165456980 31361 127.0.0.1 (7 Dec 2006 02:03:00 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Thu, 7 Dec 2006 02:03:00 +0000 (UTC) User-Agent: G2/1.0 X-HTTP-UserAgent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.12) Gecko/20050922 Fedora/1.7.12-1.3.1,gzip(gfe),gzip(gfe) Complaints-To: groups-abuse@google.com Injection-Info: l12g2000cwl.googlegroups.com; posting-host=66.126.103.122; posting-account=cw1zeQwAAABOY2vF_g6V_9cdsyY_wV9w Xref: g2news2.google.com comp.lang.ada:7838 Date: 2006-12-06T18:02:55-08:00 List-Id: Bj=F6rn Persson wrote: > Manuel Collado wrote: > > UCS-1 means encoding each character (codepoint) as a single byte whose > > numerical value is just the codepoint. Can be used only for codepoints = in > > the range (0..255). UCS-1 is the natural, implicit encoding of all 8-bit > > (and 7-bits) character sets. > > I'd still like to know where UCS-1 is defined, and by whom. > http://www.iana.org/assignments/character-sets lists ISO-10646-UCS-2, > ISO-10646-UCS-4 and ISO-10646-UCS-Basic, but no UCS-1. > http://www.unicode.org/glossary/#U also has entries for UCS-2 and UCS-4, > but no UCS-1. UCS-Basic may be the "official" name for what I'm talking about. Unfortunately, I'm having trouble figuring it out. The IANA website you referred me to is titled "Character Sets", but some of the things listed underneath are encoding standards (UTF-8, etc.) rather than character sets; UCS-Basic is listed as a "subset of Unicode", however, and Unicode is a character set (not an encoding; there are multiple ways to encode Unicode characters, including UTF-8, UTF-16, UCS-2). So this page just exemplifies the sort of confusion Manuel referred to. A quick Google search hasn't provided any further enlightenment on exactly what UCS-Basic is. Specifically, I can't tell whether it's a character set or an encoding. UCS-2 and UCS-4 are representations in which if an integer N maps to a character, then that character is represented simply by a 2- or 4-byte binary representation of N (byte ordering is an issue, though). So it would seem logical that UCS-1 would simply refer to a 1-byte binary representation of a number. That's how it seemed to me, and I did find other references to this term, so I figured it was the correct term. But maybe it isn't official. Sigh.... -- Adam