From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,5bcc293dc5642650
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
Received: by 10.68.11.199 with SMTP id s7mr1748641pbb.5.1318924477444;
        Tue, 18 Oct 2011 00:54:37 -0700 (PDT)
Path: 
 d5ni25917pbc.0!nntp.google.com!news2.google.com!goblin2!goblin.stu.neva.ru!aioe.org!.POSTED!not-for-mail
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Newsgroups: comp.lang.ada
Subject: Re: Why no Ada.Wide_Directories?
Date: Tue, 18 Oct 2011 09:55:07 +0200
Organization: cbb software GmbH
Message-ID: <1tggwi1yicf5z.1q3xra9r00oyb$.dlg@40tude.net>
References: <9937871.172.1318575525468.JavaMail.geo-discussion-forums@prib32>
 <418b8140-fafb-442f-b91c-e22cc47f8adb@y22g2000pri.googlegroups.com>
 <j7i6va$nso$1@munin.nbi.dk>
 <7156122c-b63f-487e-ad1b-0edcc6694a7a@u10g2000prl.googlegroups.com>
 <ffeeb5d0-5685-42ff-a141-72bea410f239@u10g2000prl.googlegroups.com>
Reply-To: mailbox@dmitry-kazakov.de
NNTP-Posting-Host: FbOMkhMtVLVmu7IwBnt1tw.user.speranza.aioe.org
Mime-Version: 1.0
X-Complaints-To: abuse@aioe.org
User-Agent: 40tude_Dialog/2.0.15.1
X-Notice: Filtered by postfilter v. 0.8.2
Xref: news2.google.com comp.lang.ada:14025
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Date: 2011-10-18T09:55:07+02:00
List-Id: <comp.lang.ada>

On Mon, 17 Oct 2011 18:10:35 -0700 (PDT), Adam Beneschan wrote:

> I have a feeling you're fundamentally confused about what UTF-8 is, as
> compared to "Latin-1".  Latin-1 is a character mapping.  It defines,
> for all integers in the range 0..255, what character that integer
> represents (e.g. 77 represents 'M', etc.).  Unicode is a character
> mapping that defines characters for a much larger integer range.

No, Unicode is a standard describes character mappings. Both UTF-8 and
Latin-1 are encodings. Latin-1 as an encoding has a property that there is
1-1 octet to code point correspondence, at the cost that some (most) of
code points cannot be represented by the encoding. UTF-8 lacks this
property, but is capable to represent all code points.

> Because of this, it is not feasible to work with strings or characters
> in UTF-8 encoding.  Suppose you declare a string
> 
>    S : String (1 .. 100);
> 
> but you want it to be a UTF-8 string.  How would that work?  If you
> want to look at S(50), the computer would have to start at the
> beginning of the string and figure out whether each character is
> represented as 1 or 2 bytes.  Nobody wants that.

Nobody actually cares, because strings are not processed that way. String
indices are obtained in the course of operations which keep them at the
beginnings of properly encoded code points.

It is a language problem to distinguish index (some index type) and
position (cardinal number). Ada does this BTW.

When you write S(50), what is 50 here? 50th character (code point) counting
from the beginning of the string or the index 50 of a character which
position is unknown without looking into the string? Considering the
declaration of String, it is not clear if Positive is a position or proper
index. For the latter S(50) just does is not read as "50th character".
Furthermore it is not guaranteed that of 50 is a valid index then 51 is
valid too.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de