From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=0.4 required=5.0 tests=BAYES_00,FORGED_MUA_MOZILLA
	autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,5bcc293dc5642650
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII
Received: by 10.68.38.134 with SMTP id g6mr3576721pbk.6.1318958857644;
        Tue, 18 Oct 2011 10:27:37 -0700 (PDT)
Path: 
 d5ni28049pbc.0!nntp.google.com!news1.google.com!news3.google.com!feeder.news-service.com!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: "J-P. Rosen" <rosen@adalog.fr>
Newsgroups: comp.lang.ada
Subject: Re: Why no Ada.Wide_Directories?
Date: Tue, 18 Oct 2011 19:27:37 +0200
Organization: A noiseless patient Spider
Message-ID: <j7kcu7$gcg$1@dont-email.me>
References: <9937871.172.1318575525468.JavaMail.geo-discussion-forums@prib32>
 <418b8140-fafb-442f-b91c-e22cc47f8adb@y22g2000pri.googlegroups.com>
 <j7i6va$nso$1@munin.nbi.dk>
 <7156122c-b63f-487e-ad1b-0edcc6694a7a@u10g2000prl.googlegroups.com>
 <ffeeb5d0-5685-42ff-a141-72bea410f239@u10g2000prl.googlegroups.com>
 <1tggwi1yicf5z.1q3xra9r00oyb$.dlg@40tude.net>
 <dce57c61-b582-4f1d-ba0a-ffc18e9c4c3b@p27g2000prp.googlegroups.com>
Mime-Version: 1.0
Injection-Date: Tue, 18 Oct 2011 17:27:36 +0000 (UTC)
Injection-Info: mx04.eternal-september.org;
 posting-host="kNGZANgPhxsSLx11YhGgCw";
	logging-data="16784"; mail-complaints-to="abuse@eternal-september.org";
	posting-account="U2FsdGVkX18KUH1+GiKDk9u/7Cy7hntf"
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1
In-Reply-To: 
 <dce57c61-b582-4f1d-ba0a-ffc18e9c4c3b@p27g2000prp.googlegroups.com>
Cancel-Lock: sha1:5p2jlVyYB8XqqjHEMjI8/5q0WWM=
Xref: news1.google.com comp.lang.ada:18571
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Date: 2011-10-18T19:27:37+02:00
List-Id: <comp.lang.ada>

Le 18/10/2011 17:34, Adam Beneschan a �crit :
> On Oct 18, 12:55 am, "Dmitry A. Kazakov" <mail...@dmitry-kazakov.de>
> wrote:
>> On Mon, 17 Oct 2011 18:10:35 -0700 (PDT), Adam Beneschan wrote:
>>> I have a feeling you're fundamentally confused about what UTF-8 is, as
>>> compared to "Latin-1".  Latin-1 is a character mapping.  It defines,
>>> for all integers in the range 0..255, what character that integer
>>> represents (e.g. 77 represents 'M', etc.).  Unicode is a character
>>> mapping that defines characters for a much larger integer range.
>>
>> No, Unicode is a standard describes character mappings. Both UTF-8 and
>> Latin-1 are encodings. Latin-1 as an encoding has a property that there is
>> 1-1 octet to code point correspondence, at the cost that some (most) of
>> code points cannot be represented by the encoding. UTF-8 lacks this
>> property, but is capable to represent all code points.
> 
> Sigh... I guess you're right about the term "Latin-1".  It appears to
> be *both* a character mapping *and* an encoding, based on a bit of
> Wikipedia research.  The problem for me is this: what does that make
> Latin-2, Latin-3, KOI8-R, etc.?  Those seem to describe the same
> encoding mechanism as Latin-1 (each code represented as one 8-bit
> byte), but with different meanings for the codes in the 16#A0#..16#FF#
> range.  So the same encoding scheme seems to have multiple different
> names.  That's very confusing to me.
> 
Not 100% sure, but I think  here is the picture.
1) Code points are always 31 bits (or maybe 30).
2) Below is the lower left corner of BMP (use fixed fonts!):

|
|____________________
|         |         |
| Latin 1 | Latin 2 |
|_________|_________|_______

The lower halves of Latin-1 and Latin-2 are identical, i.e. the same
characters have two different code-points, differing by 256.

When you use Latin-1 with 8 bit bytes, you can view this as an encoding
with the 24 upper bits being 16#00_00_00#. When you use Latin-2 with 8
bit bytes, you can view this as an encoding with the 24 upper bits being
16#00_00_01#.

So in a sense, Latin-1 and Latin-2 are both character sets, and when
represented on only 8 bits, an encoding.

Does this make sense?
-- 
---------------------------------------------------------
           J-P. Rosen (rosen@adalog.fr)
Adalog a d�m�nag� / Adalog has moved:
2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX
Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00