From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,FORGED_GMAIL_RCVD,
	FREEMAIL_FROM autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,5bcc293dc5642650
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII
Received: by 10.68.27.230 with SMTP id w6mr903451pbg.3.1318906569807;
        Mon, 17 Oct 2011 19:56:09 -0700 (PDT)
Path: 
 d5ni24814pbc.0!nntp.google.com!news2.google.com!postnews.google.com!p27g2000prp.googlegroups.com!not-for-mail
From: ytomino <aghia05@gmail.com>
Newsgroups: comp.lang.ada
Subject: Re: Why no Ada.Wide_Directories?
Date: Mon, 17 Oct 2011 19:32:04 -0700 (PDT)
Organization: http://groups.google.com
Message-ID: 
 <409c81ab-bd54-493b-beb4-a0cca99ec306@p27g2000prp.googlegroups.com>
References: <9937871.172.1318575525468.JavaMail.geo-discussion-forums@prib32>
 <418b8140-fafb-442f-b91c-e22cc47f8adb@y22g2000pri.googlegroups.com>
 <j7i6va$nso$1@munin.nbi.dk>
 <7156122c-b63f-487e-ad1b-0edcc6694a7a@u10g2000prl.googlegroups.com>
 <ffeeb5d0-5685-42ff-a141-72bea410f239@u10g2000prl.googlegroups.com>
NNTP-Posting-Host: 118.6.135.155
Mime-Version: 1.0
X-Trace: posting.google.com 1318906569 19292 127.0.0.1 (18 Oct 2011 02:56:09
 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Tue, 18 Oct 2011 02:56:09 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: p27g2000prp.googlegroups.com; posting-host=118.6.135.155;
 posting-account=Mi71UQoAAACnFhXo1NVxPlurinchtkIj
User-Agent: G2/1.0
X-Google-Web-Client: true
X-Google-Header-Order: HNKUARELSC
X-HTTP-UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_5_8)
 AppleWebKit/535.1 (KHTML,
 like Gecko) Chrome/14.0.835.202 Safari/535.1,gzip(gfe)
Xref: news2.google.com comp.lang.ada:14018
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Date: 2011-10-17T19:32:04-07:00
List-Id: <comp.lang.ada>

On Oct 18, 10:10=A0am, Adam Beneschan <a...@irvine.com> wrote:
> On Oct 17, 4:47=A0pm, ytomino <aghi...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
> > On Oct 18, 6:33=A0am, "Randy Brukardt" <ra...@rrsoftware.com> wrote:
>
> > > Say what?
>
> > > Ada.Strings.Encoding (new in Ada 2012) uses a subtype of String to st=
ore
> > > UTF-8 encoded strings. As such, I'd find it pretty surprising if doin=
g so
> > > was "a violation of the standard".
>
> > > The intent has always been that Open, Ada.Directories, etc. take UTF-=
8
> > > strings as an option. Presumably the implementation would use a Form =
to
> > > specify that the file names in UTF-8 form rather than Latin-1. (I was=
n't
> > > able to find a reference for this in a quick search, but I know it ha=
s been
> > > talked about on several occasions.)
>
> > > One of the primary reasons that Ada.Strings.Encoding uses a subtype o=
f
> > > String rather than a separate type is so that it can be passed to Ope=
n and
> > > the like.
>
> > > It's probably true that we should standardize on the Form needed to u=
se
> > > UTF-8 strings in these contexts, or at least come up with Implementat=
ion
> > > Advice on that point.
>
> > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0Randy.
>
> > Good news. Thanks for letting know.
> > My worry is decreased a little.
>
> > However, even if that is right, Form parameters are missing for many
> > subprograms.
> > Probably, All subprograms in Ada.Directories,
> > Ada.Directories.Hierarchical_File_Names, Ada.Command_Line,
> > Ada.Environment_Variables and other subprograms having Name parameter
> > or returning a file name should have Form parameter.
> > (For example, I do Open (X, Form =3D> "UTF-8"). Which does Name (X)
> > returns UTF-8 or Latin-1?)
>
> > Moreover, in the future, we will always use I/O subprograms as UTF-8
> > mode if what you say is realized.
> > But other libraries in the standard are explicitly defined as Latin-1.
> > It's certain that Ada.Character.Handling.To_Upper breaks UTF-8.
>
> I have a feeling you're fundamentally confused about what UTF-8 is, as
> compared to "Latin-1". =A0Latin-1 is a character mapping. =A0It defines,
> for all integers in the range 0..255, what character that integer
> represents (e.g. 77 represents 'M', etc.). =A0Unicode is a character
> mapping that defines characters for a much larger integer range. =A0For
> integers in the range 0..255, the character represented in Unicode is
> the same as that in Latin-1; higher integers represent characters in
> other alphabets, other symbols, etc. =A0Those mappings just tell you
> what symbols go with what numbers, and they don't say anything about
> how the numbers are supposed to be stored.
>
> UTF-8 is an encoding (representation). =A0It defines, for each non-
> negative integer up to a certain point, what bits are used to
> represent that integer. =A0The number of bits is not fixed. =A0So even if
> you're working with characters all in the 0..255 range, some of those
> characters will be represented in 8 bits (one byte) and some will take
> 16 bits (two bytes).
>
> Because of this, it is not feasible to work with strings or characters
> in UTF-8 encoding. =A0Suppose you declare a string
>
> =A0 =A0S : String (1 .. 100);
>
> but you want it to be a UTF-8 string. =A0How would that work? =A0If you
> want to look at S(50), the computer would have to start at the
> beginning of the string and figure out whether each character is
> represented as 1 or 2 bytes. =A0Nobody wants that.
>
> The only sane way to work with strings in memory is to use a format
> where every character is the same size (String if all your characters
> are in the 0..255 range, Wide_String for 0..65535, Wide_Wide_String
> for 0..2**32-1). =A0Then, if you have a string of bytes in UTF-8 format,
> you convert it to a regular (Wide_)(Wide_)String with routines in
> Ada.Strings.UTF_Encoding; and it also has routines for converting
> regular strings to UTF-8 format. =A0But you don't want to *keep* strings
> in memory and work with them in UTF-8 format. =A0That's why it doesn't
> make sense to have string routines (like
> Ada.Strings.Equal_Case_Insensitive or Ada.Character_Handling.To_Upper)
> that work with UTF-8.
>
> Hope this solves your problem.
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0-- Adam

I'm not confused. Your misreading.

Of course, if applications always hold file names as Wide_Wide_String,
and encode to UTF-8 only/every calling I/O subprograms as what you
say, so it's very simple and it is perhaps intended method. I
understand it.

But, where do these file names come from?
These are usually told by command-line or configuration file (written
by user).
It is probably encoded UTF-8 if the locale setting of OS is UTF-8.
So Form parameters of subprograms in Ada.Command_Line are necessary
and it's natural keeping UTF-8.

(Some file systems like Linux accept broken code as correct file name.
Applications must not (can not?) decode/encode file names in this
case.
Broken file name may be right file name if user sets LANG variable.
Same thing is in NTFS/NFS+. These file systems can accept broken
UTF-16. Strictly speaking, always, an application should not encode/
decode file names. But, Ada decides file names are stored into String
(as long as Randy says). So we have to give up about UTF-16 file
systems.)

And, it's popular that text processing functions keep encoded strings
in many other libraries or languages. I do not necessarily want to
deny the way of Ada, but I feel your opinion is prejudiced. It is not
so difficult as you say in fact.