From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 X-Google-Thread: 103376,5bcc293dc5642650 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Received: by 10.204.156.155 with SMTP id x27mr279608bkw.7.1318940825248; Tue, 18 Oct 2011 05:27:05 -0700 (PDT) Path: l23ni13560bkv.0!nntp.google.com!news2.google.com!goblin2!goblin.stu.neva.ru!aioe.org!.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: Why no Ada.Wide_Directories? Date: Tue, 18 Oct 2011 14:27:34 +0200 Organization: cbb software GmbH Message-ID: <1944zexmfvu0c.15c6rf05wbv94.dlg@40tude.net> References: <9937871.172.1318575525468.JavaMail.geo-discussion-forums@prib32> <418b8140-fafb-442f-b91c-e22cc47f8adb@y22g2000pri.googlegroups.com> <1gzuyf8eg0o0k.7yo8q1lqfiyr.dlg@40tude.net> <4j9sogywhu37.99zyvbiqma79.dlg@40tude.net> <96CdnQ48jI_VxgDTRVn_vwA@giganews.com> Reply-To: mailbox@dmitry-kazakov.de NNTP-Posting-Host: FbOMkhMtVLVmu7IwBnt1tw.user.speranza.aioe.org Mime-Version: 1.0 X-Complaints-To: abuse@aioe.org User-Agent: 40tude_Dialog/2.0.15.1 X-Notice: Filtered by postfilter v. 0.8.2 Xref: news2.google.com comp.lang.ada:14048 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Date: 2011-10-18T14:27:34+02:00 List-Id: On Tue, 18 Oct 2011 06:55:54 -0400, Peter C. Chapin wrote: > On 2011-10-17 03:59, Dmitry A. Kazakov wrote: > >> Wrong. All file systems share common features, which can and must be >> properly abstracted. System-specific are the implementations, not the >> package specifications. > > Not all possible file system features, even common ones, are abstracted > by the standard. Maybe, but the code point of a file name is not that kind of feature. Each file system in the end operates Unicode code points, even if it does not support Unicode. >> Because inability to spell the file name is not same as lacking access >> rights. Access rights are external to the program code. The file name, >> coded as a string literal is a part of the program. Failure of the former >> is not a bug. The latter is a bug, because the file exists, is accessible >> and has proper name. A program bug which cannot be fixed is a language >> design bug. > > I don't see it the same way. Extended attributes also exist, are > accessible (to the system), and have names. Yet the standard doesn't > allow you to access them. It would be same if the standard would not allow to access file names at all. But it allows that, though inconsistently. Not doing something is not a bug. Bug is when something is done wrong. > The issue of character set handling is slippery business, as you know. > Perhaps the fundamental problem is that Unicode text is essentially > binary data. No, Unicode text is a sequence of code points, which can be represented using various encodings. That particular representation is binary data. > For example when reading a Unicode file one needs to treat > it as a binary file and then decode the contents (into String, > Wide_String or Wide_Wide_String as desired) as it is read. Well, that depends on the semantics of these types. If we consider them character strings, then you are wrong. Character strings are not representations they are just chains of Unicode code points constrained to some set of code points like Wide_String is [*]. Reading lines of a *text* file as Wide_String or as Wide_Wide_String assumes an appropriate decoding rather than mindless shuffling of chunks of memory. Ideally, from an *Ada* implementation I would expect that when an UTF-8 encoded text file is read as Wide_String, I would get exactly same sequences of code points as in UTF-8 or Data_Error for those, which cannot be represented. I see no problem in implementing it this way and requiring such implementations by the standard. For raw binary I/O there are streams and direct I/O of Unsigned_8 or whatever octet/memory unit type. > Personally the idea of holding on to encoded data in memory seems like a > bad idea. I know some programming languages store strings internally in > "UTF-8 format" but that never made sense to me. UTF-8 encoded data is > binary data. It should be put into an array of bytes or have a new type > for it. I definitely don't want to accidentally mix "normal" strings of > (decoded) characters with UTF-8 encoded strings. I have a feeling, > Dmitry, this is what you are also saying. Yes, I too wished to have separate string types for UTF-8 and UTF-16. It is IMO bad to mandate Ada.Directories UTF-8. Rather it should be extended with Wide_Wide_String versions as well as Ada.Text_IO and all other packages where file names appear. I would also have file paths, file names, file extensions etc properly typed, i.e. not as raw strings, but that is another story for another day. ----------------------- * An alternative interpretation could be that Wide_String is UCS-2 (+endianness specification) encoding. But that would a bad idea for a higher level language as Ada. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de