From: ytomino <aghia05@gmail.com>
Subject: Re: Why no Ada.Wide_Directories?
Date: Mon, 17 Oct 2011 19:32:04 -0700 (PDT)
Date: 2011-10-17T19:32:04-07:00 [thread overview]
Message-ID: <409c81ab-bd54-493b-beb4-a0cca99ec306@p27g2000prp.googlegroups.com> (raw)
In-Reply-To: ffeeb5d0-5685-42ff-a141-72bea410f239@u10g2000prl.googlegroups.com
On Oct 18, 10:10 am, Adam Beneschan <a...@irvine.com> wrote:
> On Oct 17, 4:47 pm, ytomino <aghi...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
> > On Oct 18, 6:33 am, "Randy Brukardt" <ra...@rrsoftware.com> wrote:
>
> > > Say what?
>
> > > Ada.Strings.Encoding (new in Ada 2012) uses a subtype of String to store
> > > UTF-8 encoded strings. As such, I'd find it pretty surprising if doing so
> > > was "a violation of the standard".
>
> > > The intent has always been that Open, Ada.Directories, etc. take UTF-8
> > > strings as an option. Presumably the implementation would use a Form to
> > > specify that the file names in UTF-8 form rather than Latin-1. (I wasn't
> > > able to find a reference for this in a quick search, but I know it has been
> > > talked about on several occasions.)
>
> > > One of the primary reasons that Ada.Strings.Encoding uses a subtype of
> > > String rather than a separate type is so that it can be passed to Open and
> > > the like.
>
> > > It's probably true that we should standardize on the Form needed to use
> > > UTF-8 strings in these contexts, or at least come up with Implementation
> > > Advice on that point.
>
> > > Randy.
>
> > Good news. Thanks for letting know.
> > My worry is decreased a little.
>
> > However, even if that is right, Form parameters are missing for many
> > subprograms.
> > Probably, All subprograms in Ada.Directories,
> > Ada.Directories.Hierarchical_File_Names, Ada.Command_Line,
> > Ada.Environment_Variables and other subprograms having Name parameter
> > or returning a file name should have Form parameter.
> > (For example, I do Open (X, Form => "UTF-8"). Which does Name (X)
> > returns UTF-8 or Latin-1?)
>
> > Moreover, in the future, we will always use I/O subprograms as UTF-8
> > mode if what you say is realized.
> > But other libraries in the standard are explicitly defined as Latin-1.
> > It's certain that Ada.Character.Handling.To_Upper breaks UTF-8.
>
> I have a feeling you're fundamentally confused about what UTF-8 is, as
> compared to "Latin-1". Latin-1 is a character mapping. It defines,
> for all integers in the range 0..255, what character that integer
> represents (e.g. 77 represents 'M', etc.). Unicode is a character
> mapping that defines characters for a much larger integer range. For
> integers in the range 0..255, the character represented in Unicode is
> the same as that in Latin-1; higher integers represent characters in
> other alphabets, other symbols, etc. Those mappings just tell you
> what symbols go with what numbers, and they don't say anything about
> how the numbers are supposed to be stored.
>
> UTF-8 is an encoding (representation). It defines, for each non-
> negative integer up to a certain point, what bits are used to
> represent that integer. The number of bits is not fixed. So even if
> you're working with characters all in the 0..255 range, some of those
> characters will be represented in 8 bits (one byte) and some will take
> 16 bits (two bytes).
>
> Because of this, it is not feasible to work with strings or characters
> in UTF-8 encoding. Suppose you declare a string
>
> S : String (1 .. 100);
>
> but you want it to be a UTF-8 string. How would that work? If you
> want to look at S(50), the computer would have to start at the
> beginning of the string and figure out whether each character is
> represented as 1 or 2 bytes. Nobody wants that.
>
> The only sane way to work with strings in memory is to use a format
> where every character is the same size (String if all your characters
> are in the 0..255 range, Wide_String for 0..65535, Wide_Wide_String
> for 0..2**32-1). Then, if you have a string of bytes in UTF-8 format,
> you convert it to a regular (Wide_)(Wide_)String with routines in
> Ada.Strings.UTF_Encoding; and it also has routines for converting
> regular strings to UTF-8 format. But you don't want to *keep* strings
> in memory and work with them in UTF-8 format. That's why it doesn't
> make sense to have string routines (like
> Ada.Strings.Equal_Case_Insensitive or Ada.Character_Handling.To_Upper)
> that work with UTF-8.
>
> Hope this solves your problem.
>
> -- Adam
I'm not confused. Your misreading.
Of course, if applications always hold file names as Wide_Wide_String,
and encode to UTF-8 only/every calling I/O subprograms as what you
say, so it's very simple and it is perhaps intended method. I
understand it.
But, where do these file names come from?
These are usually told by command-line or configuration file (written
by user).
It is probably encoded UTF-8 if the locale setting of OS is UTF-8.
So Form parameters of subprograms in Ada.Command_Line are necessary
and it's natural keeping UTF-8.
(Some file systems like Linux accept broken code as correct file name.
Applications must not (can not?) decode/encode file names in this
case.
Broken file name may be right file name if user sets LANG variable.
Same thing is in NTFS/NFS+. These file systems can accept broken
UTF-16. Strictly speaking, always, an application should not encode/
decode file names. But, Ada decides file names are stored into String
(as long as Randy says). So we have to give up about UTF-16 file
systems.)
And, it's popular that text processing functions keep encoded strings
in many other libraries or languages. I do not necessarily want to
deny the way of Ada, but I feel your opinion is prejudiced. It is not
so difficult as you say in fact.
next prev parent reply other threads:[~2011-10-18 2:56 UTC|newest]
Thread overview: 100+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-14 6:58 Why no Ada.Wide_Directories? Michael Rohan
2011-10-14 7:39 ` Yannick Duchêne (Hibou57)
2011-10-14 9:07 ` Dmitry A. Kazakov
2011-10-14 12:48 ` Yannick Duchêne (Hibou57)
2011-10-14 12:54 ` Yannick Duchêne (Hibou57)
2011-10-15 1:06 ` ytomino
2011-10-15 6:55 ` Vadim Godunko
2011-10-15 12:34 ` ytomino
2011-10-15 8:38 ` Dmitry A. Kazakov
2011-10-15 13:12 ` Peter C. Chapin
2011-10-15 13:22 ` Ludovic Brenta
2011-10-15 14:47 ` Dmitry A. Kazakov
2011-10-16 5:48 ` Yannick Duchêne (Hibou57)
2011-10-17 0:15 ` Peter C. Chapin
2011-10-17 3:23 ` Yannick Duchêne (Hibou57)
2011-10-17 7:12 ` Simon Wright
2011-10-17 7:59 ` Dmitry A. Kazakov
2011-10-18 10:55 ` Peter C. Chapin
2011-10-18 12:27 ` Dmitry A. Kazakov
2011-10-16 5:51 ` Yannick Duchêne (Hibou57)
2011-10-17 21:41 ` Randy Brukardt
2011-10-18 7:29 ` Dmitry A. Kazakov
2011-10-18 14:06 ` Pascal Obry
2011-10-18 14:08 ` Pascal Obry
2011-10-19 21:32 ` Randy Brukardt
2011-10-17 21:33 ` Randy Brukardt
2011-10-17 23:47 ` ytomino
2011-10-18 1:10 ` Adam Beneschan
2011-10-18 2:32 ` ytomino [this message]
2011-10-18 4:46 ` ytomino
2011-10-18 9:32 ` Yannick Duchêne (Hibou57)
2011-10-18 10:00 ` Dmitry A. Kazakov
2011-10-18 10:06 ` Yannick Duchêne (Hibou57)
2011-10-18 12:01 ` Dmitry A. Kazakov
2011-10-18 15:02 ` Adam Beneschan
2011-10-18 15:16 ` Dmitry A. Kazakov
2011-10-18 23:42 ` Adam Beneschan
2011-10-19 8:12 ` Dmitry A. Kazakov
2011-10-19 21:43 ` Randy Brukardt
2011-10-20 7:37 ` Dmitry A. Kazakov
2011-10-20 11:04 ` Yannick Duchêne (Hibou57)
2011-10-20 12:21 ` Dmitry A. Kazakov
2011-10-20 12:38 ` Yannick Duchêne (Hibou57)
2011-10-20 14:31 ` Dmitry A. Kazakov
2011-10-20 15:54 ` Yannick Duchêne (Hibou57)
2011-10-20 17:35 ` Dmitry A. Kazakov
2011-10-21 12:53 ` Yannick Duchêne (Hibou57)
2011-10-21 13:41 ` Dmitry A. Kazakov
2011-10-25 19:22 ` Randy Brukardt
2011-10-25 19:35 ` Dmitry A. Kazakov
2011-10-26 22:41 ` Randy Brukardt
2011-10-27 7:43 ` Dmitry A. Kazakov
2011-10-27 15:13 ` Yannick Duchêne (Hibou57)
2011-10-27 19:39 ` Robert A Duff
2011-10-27 21:09 ` Yannick Duchêne (Hibou57)
2011-10-28 7:50 ` Dmitry A. Kazakov
2011-10-28 8:45 ` Yannick Duchêne (Hibou57)
2011-10-28 14:59 ` Dmitry A. Kazakov
2011-10-20 17:40 ` J-P. Rosen
2011-10-20 18:43 ` Dmitry A. Kazakov
2011-10-21 10:07 ` Vadim Godunko
2011-10-21 11:25 ` J-P. Rosen
2011-10-21 12:25 ` Yannick Duchêne (Hibou57)
2011-10-21 13:13 ` Dmitry A. Kazakov
2011-10-21 16:03 ` Yannick Duchêne (Hibou57)
2011-10-21 18:34 ` Dmitry A. Kazakov
2011-10-21 19:30 ` Yannick Duchêne (Hibou57)
2011-10-21 20:02 ` Dmitry A. Kazakov
2011-10-21 20:36 ` Yannick Duchêne (Hibou57)
2011-10-22 7:54 ` Dmitry A. Kazakov
2011-10-22 20:28 ` Yannick Duchêne (Hibou57)
2011-10-22 22:23 ` Yannick Duchêne (Hibou57)
2011-10-23 7:53 ` Dmitry A. Kazakov
2011-10-25 19:16 ` Randy Brukardt
2011-10-21 18:55 ` Vadim Godunko
2011-10-21 19:18 ` J-P. Rosen
2011-10-21 19:41 ` Yannick Duchêne (Hibou57)
2011-10-18 22:54 ` ytomino
2011-10-18 3:15 ` Yannick Duchêne (Hibou57)
2011-10-18 7:55 ` Dmitry A. Kazakov
2011-10-18 9:41 ` Yannick Duchêne (Hibou57)
2011-10-18 10:25 ` J-P. Rosen
2011-10-18 10:56 ` Yannick Duchêne (Hibou57)
2011-10-18 15:34 ` Adam Beneschan
2011-10-18 17:27 ` J-P. Rosen
2011-10-18 18:33 ` Adam Beneschan
2011-10-18 19:54 ` Yannick Duchêne (Hibou57)
2011-10-18 8:01 ` Dmitry A. Kazakov
2011-10-18 2:59 ` Yannick Duchêne (Hibou57)
2011-10-18 4:07 ` Michael Rohan
2011-10-18 4:54 ` ytomino
2011-10-18 9:54 ` Yannick Duchêne (Hibou57)
2011-10-18 10:52 ` ytomino
2011-10-18 11:02 ` Yannick Duchêne (Hibou57)
2011-10-18 21:18 ` ytomino
2011-10-18 10:10 ` J-P. Rosen
2011-10-22 6:32 ` Michael Rohan
2011-10-22 7:25 ` Yannick Duchêne (Hibou57)
2011-10-25 19:26 ` Randy Brukardt
2011-10-27 17:40 ` anon
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox