comp.lang.ada
 help / color / mirror / Atom feed
From: ytomino <aghia05@gmail.com>
Subject: Re: Why no Ada.Wide_Directories?
Date: Mon, 17 Oct 2011 19:32:04 -0700 (PDT)
Date: 2011-10-17T19:32:04-07:00	[thread overview]
Message-ID: <409c81ab-bd54-493b-beb4-a0cca99ec306@p27g2000prp.googlegroups.com> (raw)
In-Reply-To: ffeeb5d0-5685-42ff-a141-72bea410f239@u10g2000prl.googlegroups.com

On Oct 18, 10:10 am, Adam Beneschan <a...@irvine.com> wrote:
> On Oct 17, 4:47 pm, ytomino <aghi...@gmail.com> wrote:
>
>
>
>
>
>
>
>
>
> > On Oct 18, 6:33 am, "Randy Brukardt" <ra...@rrsoftware.com> wrote:
>
> > > Say what?
>
> > > Ada.Strings.Encoding (new in Ada 2012) uses a subtype of String to store
> > > UTF-8 encoded strings. As such, I'd find it pretty surprising if doing so
> > > was "a violation of the standard".
>
> > > The intent has always been that Open, Ada.Directories, etc. take UTF-8
> > > strings as an option. Presumably the implementation would use a Form to
> > > specify that the file names in UTF-8 form rather than Latin-1. (I wasn't
> > > able to find a reference for this in a quick search, but I know it has been
> > > talked about on several occasions.)
>
> > > One of the primary reasons that Ada.Strings.Encoding uses a subtype of
> > > String rather than a separate type is so that it can be passed to Open and
> > > the like.
>
> > > It's probably true that we should standardize on the Form needed to use
> > > UTF-8 strings in these contexts, or at least come up with Implementation
> > > Advice on that point.
>
> > >                                        Randy.
>
> > Good news. Thanks for letting know.
> > My worry is decreased a little.
>
> > However, even if that is right, Form parameters are missing for many
> > subprograms.
> > Probably, All subprograms in Ada.Directories,
> > Ada.Directories.Hierarchical_File_Names, Ada.Command_Line,
> > Ada.Environment_Variables and other subprograms having Name parameter
> > or returning a file name should have Form parameter.
> > (For example, I do Open (X, Form => "UTF-8"). Which does Name (X)
> > returns UTF-8 or Latin-1?)
>
> > Moreover, in the future, we will always use I/O subprograms as UTF-8
> > mode if what you say is realized.
> > But other libraries in the standard are explicitly defined as Latin-1.
> > It's certain that Ada.Character.Handling.To_Upper breaks UTF-8.
>
> I have a feeling you're fundamentally confused about what UTF-8 is, as
> compared to "Latin-1".  Latin-1 is a character mapping.  It defines,
> for all integers in the range 0..255, what character that integer
> represents (e.g. 77 represents 'M', etc.).  Unicode is a character
> mapping that defines characters for a much larger integer range.  For
> integers in the range 0..255, the character represented in Unicode is
> the same as that in Latin-1; higher integers represent characters in
> other alphabets, other symbols, etc.  Those mappings just tell you
> what symbols go with what numbers, and they don't say anything about
> how the numbers are supposed to be stored.
>
> UTF-8 is an encoding (representation).  It defines, for each non-
> negative integer up to a certain point, what bits are used to
> represent that integer.  The number of bits is not fixed.  So even if
> you're working with characters all in the 0..255 range, some of those
> characters will be represented in 8 bits (one byte) and some will take
> 16 bits (two bytes).
>
> Because of this, it is not feasible to work with strings or characters
> in UTF-8 encoding.  Suppose you declare a string
>
>    S : String (1 .. 100);
>
> but you want it to be a UTF-8 string.  How would that work?  If you
> want to look at S(50), the computer would have to start at the
> beginning of the string and figure out whether each character is
> represented as 1 or 2 bytes.  Nobody wants that.
>
> The only sane way to work with strings in memory is to use a format
> where every character is the same size (String if all your characters
> are in the 0..255 range, Wide_String for 0..65535, Wide_Wide_String
> for 0..2**32-1).  Then, if you have a string of bytes in UTF-8 format,
> you convert it to a regular (Wide_)(Wide_)String with routines in
> Ada.Strings.UTF_Encoding; and it also has routines for converting
> regular strings to UTF-8 format.  But you don't want to *keep* strings
> in memory and work with them in UTF-8 format.  That's why it doesn't
> make sense to have string routines (like
> Ada.Strings.Equal_Case_Insensitive or Ada.Character_Handling.To_Upper)
> that work with UTF-8.
>
> Hope this solves your problem.
>
>                              -- Adam

I'm not confused. Your misreading.

Of course, if applications always hold file names as Wide_Wide_String,
and encode to UTF-8 only/every calling I/O subprograms as what you
say, so it's very simple and it is perhaps intended method. I
understand it.

But, where do these file names come from?
These are usually told by command-line or configuration file (written
by user).
It is probably encoded UTF-8 if the locale setting of OS is UTF-8.
So Form parameters of subprograms in Ada.Command_Line are necessary
and it's natural keeping UTF-8.

(Some file systems like Linux accept broken code as correct file name.
Applications must not (can not?) decode/encode file names in this
case.
Broken file name may be right file name if user sets LANG variable.
Same thing is in NTFS/NFS+. These file systems can accept broken
UTF-16. Strictly speaking, always, an application should not encode/
decode file names. But, Ada decides file names are stored into String
(as long as Randy says). So we have to give up about UTF-16 file
systems.)

And, it's popular that text processing functions keep encoded strings
in many other libraries or languages. I do not necessarily want to
deny the way of Ada, but I feel your opinion is prejudiced. It is not
so difficult as you say in fact.



  reply	other threads:[~2011-10-18  2:56 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-14  6:58 Why no Ada.Wide_Directories? Michael Rohan
2011-10-14  7:39 ` Yannick Duchêne (Hibou57)
2011-10-14  9:07   ` Dmitry A. Kazakov
2011-10-14 12:48     ` Yannick Duchêne (Hibou57)
2011-10-14 12:54     ` Yannick Duchêne (Hibou57)
2011-10-15  1:06 ` ytomino
2011-10-15  6:55   ` Vadim Godunko
2011-10-15 12:34     ` ytomino
2011-10-15  8:38   ` Dmitry A. Kazakov
2011-10-15 13:12     ` Peter C. Chapin
2011-10-15 13:22       ` Ludovic Brenta
2011-10-15 14:47       ` Dmitry A. Kazakov
2011-10-16  5:48         ` Yannick Duchêne (Hibou57)
2011-10-17  0:15         ` Peter C. Chapin
2011-10-17  3:23           ` Yannick Duchêne (Hibou57)
2011-10-17  7:12           ` Simon Wright
2011-10-17  7:59           ` Dmitry A. Kazakov
2011-10-18 10:55             ` Peter C. Chapin
2011-10-18 12:27               ` Dmitry A. Kazakov
2011-10-16  5:51       ` Yannick Duchêne (Hibou57)
2011-10-17 21:41         ` Randy Brukardt
2011-10-18  7:29           ` Dmitry A. Kazakov
2011-10-18 14:06           ` Pascal Obry
2011-10-18 14:08             ` Pascal Obry
2011-10-19 21:32             ` Randy Brukardt
2011-10-17 21:33   ` Randy Brukardt
2011-10-17 23:47     ` ytomino
2011-10-18  1:10       ` Adam Beneschan
2011-10-18  2:32         ` ytomino [this message]
2011-10-18  4:46           ` ytomino
2011-10-18  9:32             ` Yannick Duchêne (Hibou57)
2011-10-18 10:00               ` Dmitry A. Kazakov
2011-10-18 10:06                 ` Yannick Duchêne (Hibou57)
2011-10-18 12:01                   ` Dmitry A. Kazakov
2011-10-18 15:02           ` Adam Beneschan
2011-10-18 15:16             ` Dmitry A. Kazakov
2011-10-18 23:42               ` Adam Beneschan
2011-10-19  8:12                 ` Dmitry A. Kazakov
2011-10-19 21:43               ` Randy Brukardt
2011-10-20  7:37                 ` Dmitry A. Kazakov
2011-10-20 11:04                   ` Yannick Duchêne (Hibou57)
2011-10-20 12:21                     ` Dmitry A. Kazakov
2011-10-20 12:38                       ` Yannick Duchêne (Hibou57)
2011-10-20 14:31                         ` Dmitry A. Kazakov
2011-10-20 15:54                           ` Yannick Duchêne (Hibou57)
2011-10-20 17:35                             ` Dmitry A. Kazakov
2011-10-21 12:53                               ` Yannick Duchêne (Hibou57)
2011-10-21 13:41                                 ` Dmitry A. Kazakov
2011-10-25 19:22                                   ` Randy Brukardt
2011-10-25 19:35                                     ` Dmitry A. Kazakov
2011-10-26 22:41                                       ` Randy Brukardt
2011-10-27  7:43                                         ` Dmitry A. Kazakov
2011-10-27 15:13                                           ` Yannick Duchêne (Hibou57)
2011-10-27 19:39                                             ` Robert A Duff
2011-10-27 21:09                                               ` Yannick Duchêne (Hibou57)
2011-10-28  7:50                                                 ` Dmitry A. Kazakov
2011-10-28  8:45                                                   ` Yannick Duchêne (Hibou57)
2011-10-28 14:59                                                     ` Dmitry A. Kazakov
2011-10-20 17:40                   ` J-P. Rosen
2011-10-20 18:43                     ` Dmitry A. Kazakov
2011-10-21 10:07                     ` Vadim Godunko
2011-10-21 11:25                       ` J-P. Rosen
2011-10-21 12:25                         ` Yannick Duchêne (Hibou57)
2011-10-21 13:13                         ` Dmitry A. Kazakov
2011-10-21 16:03                           ` Yannick Duchêne (Hibou57)
2011-10-21 18:34                             ` Dmitry A. Kazakov
2011-10-21 19:30                               ` Yannick Duchêne (Hibou57)
2011-10-21 20:02                                 ` Dmitry A. Kazakov
2011-10-21 20:36                                   ` Yannick Duchêne (Hibou57)
2011-10-22  7:54                                     ` Dmitry A. Kazakov
2011-10-22 20:28                                       ` Yannick Duchêne (Hibou57)
2011-10-22 22:23                                       ` Yannick Duchêne (Hibou57)
2011-10-23  7:53                                         ` Dmitry A. Kazakov
2011-10-25 19:16                                           ` Randy Brukardt
2011-10-21 18:55                         ` Vadim Godunko
2011-10-21 19:18                           ` J-P. Rosen
2011-10-21 19:41                           ` Yannick Duchêne (Hibou57)
2011-10-18 22:54             ` ytomino
2011-10-18  3:15         ` Yannick Duchêne (Hibou57)
2011-10-18  7:55         ` Dmitry A. Kazakov
2011-10-18  9:41           ` Yannick Duchêne (Hibou57)
2011-10-18 10:25           ` J-P. Rosen
2011-10-18 10:56             ` Yannick Duchêne (Hibou57)
2011-10-18 15:34           ` Adam Beneschan
2011-10-18 17:27             ` J-P. Rosen
2011-10-18 18:33               ` Adam Beneschan
2011-10-18 19:54               ` Yannick Duchêne (Hibou57)
2011-10-18  8:01       ` Dmitry A. Kazakov
2011-10-18  2:59     ` Yannick Duchêne (Hibou57)
2011-10-18  4:07       ` Michael Rohan
2011-10-18  4:54       ` ytomino
2011-10-18  9:54         ` Yannick Duchêne (Hibou57)
2011-10-18 10:52           ` ytomino
2011-10-18 11:02             ` Yannick Duchêne (Hibou57)
2011-10-18 21:18               ` ytomino
2011-10-18 10:10       ` J-P. Rosen
2011-10-22  6:32         ` Michael Rohan
2011-10-22  7:25           ` Yannick Duchêne (Hibou57)
2011-10-25 19:26           ` Randy Brukardt
2011-10-27 17:40 ` anon
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox