From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
X-Google-Thread: 103376,5bcc293dc5642650
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
Received: by 10.204.156.155 with SMTP id x27mr279608bkw.7.1318940825248;
        Tue, 18 Oct 2011 05:27:05 -0700 (PDT)
Path: 
 l23ni13560bkv.0!nntp.google.com!news2.google.com!goblin2!goblin.stu.neva.ru!aioe.org!.POSTED!not-for-mail
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Newsgroups: comp.lang.ada
Subject: Re: Why no Ada.Wide_Directories?
Date: Tue, 18 Oct 2011 14:27:34 +0200
Organization: cbb software GmbH
Message-ID: <1944zexmfvu0c.15c6rf05wbv94.dlg@40tude.net>
References: <9937871.172.1318575525468.JavaMail.geo-discussion-forums@prib32>
 <418b8140-fafb-442f-b91c-e22cc47f8adb@y22g2000pri.googlegroups.com>
 <xp2dsxe7fuo0.144jc7zraglb2$.dlg@40tude.net>
 <JL-dnRevK71EGwTT4p2dnAA@giganews.com>
 <1gzuyf8eg0o0k.7yo8q1lqfiyr.dlg@40tude.net>
 <NfOdnfncLvw57gbT4p2dnAA@giganews.com>
 <4j9sogywhu37.99zyvbiqma79.dlg@40tude.net>
 <96CdnQ48jI_VxgDTRVn_vwA@giganews.com>
Reply-To: mailbox@dmitry-kazakov.de
NNTP-Posting-Host: FbOMkhMtVLVmu7IwBnt1tw.user.speranza.aioe.org
Mime-Version: 1.0
X-Complaints-To: abuse@aioe.org
User-Agent: 40tude_Dialog/2.0.15.1
X-Notice: Filtered by postfilter v. 0.8.2
Xref: news2.google.com comp.lang.ada:14048
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Date: 2011-10-18T14:27:34+02:00
List-Id: <comp.lang.ada>

On Tue, 18 Oct 2011 06:55:54 -0400, Peter C. Chapin wrote:

> On 2011-10-17 03:59, Dmitry A. Kazakov wrote:
> 
>> Wrong. All file systems share common features, which can and must be
>> properly abstracted. System-specific are the implementations, not the
>> package specifications.
> 
> Not all possible file system features, even common ones, are abstracted 
> by the standard.

Maybe, but the code point of a file name is not that kind of feature. Each
file system in the end operates Unicode code points, even if it does not
support Unicode.

>> Because inability to spell the file name is not same as lacking access
>> rights. Access rights are external to the program code. The file name,
>> coded as a string literal is a part of the program. Failure of the former
>> is not a bug. The latter is a bug, because the file exists, is accessible
>> and has proper name. A program bug which cannot be fixed is a language
>> design bug.
> 
> I don't see it the same way. Extended attributes also exist, are 
> accessible (to the system), and have names. Yet the standard doesn't 
> allow you to access them.

It would be same if the standard would not allow to access file names at
all. But it allows that, though inconsistently.

Not doing something is not a bug. Bug is when something is done wrong.

> The issue of character set handling is slippery business, as you know. 
> Perhaps the fundamental problem is that Unicode text is essentially 
> binary data.

No, Unicode text is a sequence of code points, which can be represented
using various encodings. That particular representation is binary data.

> For example when reading a Unicode file one needs to treat 
> it as a binary file and then decode the contents (into String, 
> Wide_String or Wide_Wide_String as desired) as it is read.

Well, that depends on the semantics of these types. If we consider them
character strings, then you are wrong. Character strings are not
representations they are just chains of Unicode code points constrained to
some set of code points like Wide_String is [*].

Reading lines of a *text* file as Wide_String or as Wide_Wide_String
assumes an appropriate decoding rather than mindless shuffling of chunks of
memory. Ideally, from an *Ada* implementation I would expect that when an
UTF-8 encoded text file is read as Wide_String, I would get exactly same
sequences of code points as in UTF-8 or Data_Error for those, which cannot
be represented. I see no problem in implementing it this way and requiring
such implementations by the standard. For raw binary I/O there are streams
and direct I/O of Unsigned_8 or whatever octet/memory unit type.

> Personally the idea of holding on to encoded data in memory seems like a 
> bad idea. I know some programming languages store strings internally in 
> "UTF-8 format" but that never made sense to me. UTF-8 encoded data is 
> binary data. It should be put into an array of bytes or have a new type 
> for it. I definitely don't want to accidentally mix "normal" strings of 
> (decoded) characters with UTF-8 encoded strings. I have a feeling, 
> Dmitry, this is what you are also saying.

Yes, I too wished to have separate string types for UTF-8 and UTF-16. It is
IMO bad to mandate Ada.Directories UTF-8. Rather it should be extended with
Wide_Wide_String versions as well as Ada.Text_IO and all other packages
where file names appear.

I would also have file paths, file names, file extensions etc properly
typed, i.e. not as raw strings, but that is another story for another day.

-----------------------
* An alternative interpretation could be that Wide_String is UCS-2
(+endianness specification) encoding. But that would a bad idea for a
higher level language as Ada.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de