From: "Randy Brukardt" <randy@rrsoftware.com>
Subject: Re: Interpretation of extensions different from Unix/Linux?
Date: Tue, 18 Aug 2009 15:48:16 -0500
Date: 2009-08-18T15:48:16-05:00 [thread overview]
Message-ID: <h6f44m$d6n$1@munin.nbi.dk> (raw)
In-Reply-To: 6f80c882-fa03-4ca9-a53e-fae34cea160d@b15g2000yqd.googlegroups.com
"Adam Beneschan" <adam@irvine.com> wrote in message
news:6f80c882-fa03-4ca9-a53e-fae34cea160d@b15g2000yqd.googlegroups.com...
On Aug 17, 3:28 pm, "Randy Brukardt" <ra...@rrsoftware.com> wrote:
>> The problem here is that String really is not the right type, but since
>> you
>> can't have string literals for private types in Ada, you can't make it a
>> private type. (And if you could have string literals, it still couldn't
>> be
>> used with the existing I/O packages, it would be way too incompatible.)
>
>That wouldn't even be an issue if UTF-8 were strictly a "storage
>format" as you called it above. If that were the case, you wouldn't
>need string literals for it. I think the problem is that UTF-8 is
>something of a hybrid. If all characters in the string are in the
>32..126 range, the "sequence of octets" stored in the UTF-8 string is
>identical to the graphic characters stored in a String. (UTF-8 was
>designed purposefully so that would happen.) In cases like that, it
>makes sense to use a string literal.
Well, the problem here is that it *always* makes sense to use a string
literal. That's how you specify what you want in storage in Ada.
I think Dmitry's point is that he'd rather always see explicit conversions.
The problem is that they don't work well -- exhibit A is unbounded strings.
That's especially true for the use-adverse like me. I hate having to write:
A_Str := Ada.Strings.Unbounded.To_Unbounded_String ("ABC");
and surely UTF8 would be worse:
A_Str := Ada.Strings.Unbounded_UTF_8.To_Unbounded_UTF_8_String ("ABC");
..
>Also, I'm afraid that using String can backfire. If I understand it
>correctly, the decision was that the Name parameter of Text_IO.Open
>should be interpreted as a UTF-8 octet sequence even though it's a
>String. But the intent is to allow string literals. At some point,
>though, some poor innocent programmer in Germany or Spain is going to
>try to use a string literal (or a Latin-1 string variable) with an
>umlaut or an accented vowel in it and get totally screwed up since
>those characters don't represent themselves in UTF-8 encoding, and
>they'll end up puzzling over how their program created a file with a
>Chinese character in the middle of the name. (Yeah, I know, that's
>very unlikely; most likely the UTF-8 encoding will simply be invalid.)
I've been presuming that UTF-8 encoding started with a BOM or something like
that, else you couldn't tell it from regular Latin-1 encoding. It would be
hard to insert a BOM into a string literal by accident!
But I do agree that this issue needs some discussion.
(Also note that a major reason for this package is to make ASIS work; there
[as with I/O], we're stuck with existing routines that return Wide_Strings
that are not enough to handle all possible text.)
Randy.
-- Adam
next prev parent reply other threads:[~2009-08-18 20:48 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-01 17:53 Interpretation of extensions different from Unix/Linux? vlc
2009-08-02 17:13 ` Jacob Sparre Andersen
2009-08-04 11:31 ` vlc
2009-08-04 11:44 ` Jacob Sparre Andersen
2009-08-04 11:57 ` Georg Bauhaus
2009-08-04 12:29 ` vlc
2009-08-04 13:43 ` Dmitry A. Kazakov
2009-08-14 4:33 ` Randy Brukardt
2009-08-14 7:37 ` Dmitry A. Kazakov
2009-08-04 12:25 ` vlc
2009-08-04 19:18 ` Jeffrey R. Carter
2009-08-04 19:52 ` Dmitry A. Kazakov
2009-08-04 20:45 ` Jeffrey R. Carter
2009-08-04 21:22 ` Dmitry A. Kazakov
2009-08-04 22:04 ` Jeffrey R. Carter
2009-08-05 8:33 ` Dmitry A. Kazakov
2009-08-05 16:07 ` Jeffrey R. Carter
2009-08-05 16:35 ` Dmitry A. Kazakov
2009-08-05 17:49 ` Jeffrey R. Carter
2009-08-05 18:16 ` Dmitry A. Kazakov
2009-08-05 19:27 ` Jeffrey R. Carter
2009-08-05 19:50 ` Dmitry A. Kazakov
2009-08-05 20:46 ` Jeffrey R. Carter
2009-08-06 7:43 ` Dmitry A. Kazakov
2009-08-05 21:33 ` Robert A Duff
2009-08-05 19:45 ` vlc
2009-08-05 19:56 ` Dmitry A. Kazakov
2009-08-14 4:56 ` Randy Brukardt
2009-08-14 8:01 ` Dmitry A. Kazakov
2009-08-14 23:02 ` Adam Beneschan
2009-08-14 23:54 ` Randy Brukardt
2009-08-15 8:10 ` Dmitry A. Kazakov
2009-08-15 12:49 ` Pascal Obry
2009-08-15 13:23 ` Dmitry A. Kazakov
2009-08-15 15:11 ` Pascal Obry
2009-08-15 17:11 ` Dmitry A. Kazakov
2009-08-15 20:07 ` Pascal Obry
2009-08-16 7:26 ` Dmitry A. Kazakov
2009-08-17 22:28 ` Randy Brukardt
2009-08-18 0:32 ` Adam Beneschan
2009-08-18 20:48 ` Randy Brukardt [this message]
2009-08-19 4:08 ` stefan-lucks
2009-08-19 22:01 ` Randy Brukardt
2009-08-19 7:37 ` Jean-Pierre Rosen
2009-08-19 16:10 ` Adam Beneschan
2009-08-19 22:11 ` Randy Brukardt
2009-08-18 7:48 ` Dmitry A. Kazakov
2009-08-18 20:37 ` Randy Brukardt
2009-08-19 8:04 ` Dmitry A. Kazakov
2009-08-19 10:32 ` Georg Bauhaus
2009-08-19 12:11 ` Dmitry A. Kazakov
2009-08-19 15:21 ` Georg Bauhaus
2009-08-19 22:40 ` Randy Brukardt
2009-08-20 8:00 ` Variable- and fixed-length-character strings (Was: Interpretation of extensions different from Unix/Linux?) Jacob Sparre Andersen
2009-08-20 19:40 ` Interpretation of extensions different from Unix/Linux? Dmitry A. Kazakov
2009-08-21 0:08 ` Randy Brukardt
2009-08-21 7:43 ` Dmitry A. Kazakov
2009-08-21 22:10 ` Randy Brukardt
2009-08-22 7:27 ` Dmitry A. Kazakov
2009-09-01 1:50 ` Randy Brukardt
2009-09-01 7:28 ` Dmitry A. Kazakov
2009-09-02 3:41 ` Stephen Leake
2009-09-02 7:17 ` Dmitry A. Kazakov
2009-09-02 19:49 ` tmoran
2009-09-03 7:41 ` Dmitry A. Kazakov
2009-09-03 17:27 ` tmoran
2009-09-03 20:44 ` Dmitry A. Kazakov
2009-09-03 22:22 ` Randy Brukardt
2009-09-04 7:40 ` Dmitry A. Kazakov
2009-09-05 1:58 ` Randy Brukardt
2009-09-05 2:08 ` Randy Brukardt
2009-09-05 8:59 ` Dmitry A. Kazakov
2009-08-21 10:11 ` Enumeration of network shared under Windows (was: Interpretation of extensions different from Unix/Linux?) Dmitry A. Kazakov
2009-08-15 16:01 ` Interpretation of extensions different from Unix/Linux? Vadim Godunko
2009-08-16 13:13 ` Stephen Leake
2009-08-14 4:46 ` Randy Brukardt
2009-08-14 9:00 ` Dmitry A. Kazakov
2009-08-04 21:19 ` vlc
2009-08-14 5:19 ` Randy Brukardt
2009-08-14 6:13 ` Wilcards in Linux (was: Interpretation of extensions different from Unix/Linux?) stefan-lucks
2009-08-14 6:24 ` stefan-lucks
2009-08-14 10:05 ` Wilcards in Linux Markus Schoepflin
2009-08-14 10:22 ` Ludovic Brenta
2009-08-14 18:20 ` Tero Koskinen
2009-08-19 20:39 ` Interpretation of extensions different from Unix/Linux? Keith Thompson
2009-08-19 22:09 ` Robert A Duff
2009-08-20 7:49 ` Jacob Sparre Andersen
2009-08-20 15:56 ` Adam Beneschan
2009-08-20 21:58 ` sjw
2009-08-20 19:44 ` Robert A Duff
2009-08-20 21:34 ` Adam Beneschan
2009-08-20 22:03 ` (see below)
2009-08-21 0:55 ` tmoran
2009-08-20 23:55 ` Randy Brukardt
2009-08-21 17:58 ` Keith Thompson
2009-08-21 18:34 ` Dmitry A. Kazakov
2009-08-21 19:32 ` Jeffrey R. Carter
2009-08-21 21:34 ` Robert A Duff
2009-08-21 22:06 ` Hyman Rosen
2009-08-24 19:51 ` Keith Thompson
2009-08-28 0:27 ` Robert A Duff
2009-08-28 13:15 ` Anders Wirzenius
2009-08-28 15:02 ` Robert A Duff
2009-08-21 8:45 ` Stephen Leake
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox