From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD,
	FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,5bcc293dc5642650
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII
Received: by 10.68.46.193 with SMTP id x1mr4667292pbm.7.1318978451271;
        Tue, 18 Oct 2011 15:54:11 -0700 (PDT)
Path: 
 d5ni29263pbc.0!nntp.google.com!news1.google.com!postnews.google.com!y22g2000pri.googlegroups.com!not-for-mail
From: ytomino <aghia05@gmail.com>
Newsgroups: comp.lang.ada
Subject: Re: Why no Ada.Wide_Directories?
Date: Tue, 18 Oct 2011 15:54:10 -0700 (PDT)
Organization: http://groups.google.com
Message-ID: 
 <7e997382-55c1-45af-aa82-0d2067840e8b@y22g2000pri.googlegroups.com>
References: <9937871.172.1318575525468.JavaMail.geo-discussion-forums@prib32>
 <418b8140-fafb-442f-b91c-e22cc47f8adb@y22g2000pri.googlegroups.com>
 <j7i6va$nso$1@munin.nbi.dk>
 <7156122c-b63f-487e-ad1b-0edcc6694a7a@u10g2000prl.googlegroups.com>
 <ffeeb5d0-5685-42ff-a141-72bea410f239@u10g2000prl.googlegroups.com>
 <409c81ab-bd54-493b-beb4-a0cca99ec306@p27g2000prp.googlegroups.com>
 <d831c4d8-3540-44cb-8976-e588e22b4c59@l10g2000pra.googlegroups.com>
NNTP-Posting-Host: 118.6.135.155
Mime-Version: 1.0
X-Trace: posting.google.com 1318978451 30556 127.0.0.1 (18 Oct 2011 22:54:11
 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Tue, 18 Oct 2011 22:54:11 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: y22g2000pri.googlegroups.com; posting-host=118.6.135.155;
 posting-account=Mi71UQoAAACnFhXo1NVxPlurinchtkIj
User-Agent: G2/1.0
X-Google-Web-Client: true
X-Google-Header-Order: HNKUARELSC
X-HTTP-UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_5_8)
 AppleWebKit/535.1 (KHTML,
 like Gecko) Chrome/14.0.835.202 Safari/535.1,gzip(gfe)
Xref: news1.google.com comp.lang.ada:18581
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Date: 2011-10-18T15:54:10-07:00
List-Id: <comp.lang.ada>

On Oct 19, 12:02=A0am, Adam Beneschan <a...@irvine.com> wrote:
> I think we have a terminology problem.
OK, sorry that my point of the argument was not put in order well.
Do confirming.

> Latin-1 is a set of characters (a subset of the full Unicode character se=
t).
Yes.
And it's also used as name of encoding. (ISO 8859-1, like Yannick
calls)

> So I get
> confused when people talk about Latin-1 versus UTF-8 strings as if
> they were mutually exclusive.  They're not, the way I understand the
> terms.  You can have a string composed of Latin-1 characters that's
> represented using UTF-8 encoding; and the bits in that string would be
> different from a string of the same Latin-1 characters using the
> "regular" encoding, if any character in the string is in the 16#80#..
> 16#FF# range.

Yes.
"Latin-1 as character set" is not exclusive with Unicode (UCS-2 or
UCS-4).
"Latin-1 as encoding" is exclusive with UTF-8.
And then, I (we?) talked about "Latin-1 as encoding".

> On the other hand, I was confused by your statement
> "Ada.Character.Handling.To_Upper breaks UTF-8".  I don't even see a
> way for this to make sense.  Ada.Characters.Handling works on
> character types, and a character type is an enumeration type; but a
> UTF-8 "character" can't be an enumeration type at all, since it's a
> variable-length sequence of 8-bit bytes.  I'm not quite sure what you
> meant here.

Ada.Characters and Ada.Strings are defined to work with "Latin-1 as
encoding" in String type.
Some subprograms (like To_Upper) in these will replace upper half
characters (16#80#..) to meaningless values in String holding UTF-8,
if we invoke these with UTF-8 String. (Equal_Case_Insensitive does not
replace characters, but returns meaningless value if parameters have
upper half characters encoded as UTF-8.)

Of course, Ada.Wide_Wide_Characters.Handling.To_Upper
(UTF_Encoding.Wide_Wide_Strings.Decode (any UTF-8 encoded string))
works fine.

> As to having utilities such as versions of Ada.Strings.Unbounded or
> Ada.Strings.Fixed that work directly on UTF-8-encoded strings (and
> versions of Ada.Characters that operate on single UTF-8-encoded
> characters): it's certainly possible to write a package like that, and
> anyone is free to do so, but I just don't think they'd be widely used
> enough to add to the Standard.  I could be wrong.

I throught the standard library is going to be separated UTF-8 from
Latin-1, when read about UTF-8 mode of Form parameter that Randy says.
Latin-1 is not familiar for me usually, so I has wanted UTF-8 versions
of Ada.Characters. Sorry that my personal wish was mixed.
But it's certain that the standard library has some lacks for handling
non-ASCII file names.

By the way...

I probably will confuse you more :-)
Do you know that single code-point is NOT single letter for display?
Unicode has "composed character". The cases is existing that plural
code-points represent single real letter.
(refer http://www.unicode.org/reports/tr15/tr15-33.html)
In addition, Unicode has "variation selector", This is a decorator for
previous letter (possible to mix with composed character).
(refer http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html)

Therefore, the difficulty of handling Wide_Wide_String is similar to
the difficulty of handling encoded (UTF-8 or other format) string, in
fact.