From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,5d4095813b818c7d X-Google-Attributes: gid103376,public X-Google-Language: ENGLISH,ASCII Date: Mon, 11 Dec 2006 20:49:02 +0100 From: Manuel Collado User-Agent: Thunderbird 1.5 (Windows/20051201) MIME-Version: 1.0 Newsgroups: comp.lang.ada Subject: Re: Reading "normal" text files with Wide_Text_IO in GNAT References: <1164916470.648544.256710@n67g2000cwd.googlegroups.com> <1165256255.486012.132810@l12g2000cwl.googlegroups.com> <4574b0c2@news.upm.es> <1165456975.595248.177740@l12g2000cwl.googlegroups.com> In-Reply-To: <1165456975.595248.177740@l12g2000cwl.googlegroups.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit NNTP-Posting-Host: 138.100.242.216 Message-ID: <457db62d@news.upm.es> X-Trace: 11 Dec 2006 20:49:01 +0100, 138.100.242.216 Path: g2news2.google.com!news1.google.com!news.germany.com!news.belwue.de!kanaga.switch.ch!switch.ch!news.rediris.es!news.upm.es!138.100.242.216 Xref: g2news2.google.com comp.lang.ada:7891 Date: 2006-12-11T20:49:02+01:00 List-Id: Adam Beneschan escribi�: > Bj�rn Persson wrote: >> ... >> I'd still like to know where UCS-1 is defined, and by whom. >> http://www.iana.org/assignments/character-sets lists ISO-10646-UCS-2, >> ISO-10646-UCS-4 and ISO-10646-UCS-Basic, but no UCS-1. >> http://www.unicode.org/glossary/#U also has entries for UCS-2 and UCS-4, >> but no UCS-1. > ... > UCS-2 and UCS-4 are representations in which if an integer N maps to a > character, then that character is represented simply by a 2- or 4-byte > binary representation of N (byte ordering is an issue, though). So it > would seem logical that UCS-1 would simply refer to a 1-byte binary > representation of a number. That's how it seemed to me, and I did find > other references to this term, so I figured it was the correct term. > But maybe it isn't official. Well, it seems that there are no official names for simple, direct encodings (no tied to a given character set). In fact UCS-2 and UCS-4 are specific names for Unicode stuff (UCS means Universal Character Set). Character encoding concepts are precisely defined in: http://en.wikipedia.org/wiki/Character_encoding As you can see, the encoding issue is composed of two separated ideas: the CEF (character encodng form) and the CES (character encoding scheme). Some of the latest ones have explicit names. But the direct CEFs are so simple that they don't need explicit names (just the size of the code value). If we take UCS-2 and UCS-4 out of the Unicode world and use them as general names for direct CEFs with 16-bit and 32-bit code values, then UCS-1 becomes the natural name for the direct CEF with 8-bit code values. Let it be official or not. Regards. -- Manuel Collado - http://lml.ls.fi.upm.es/~mcollado