comp.lang.ada
 help / color / mirror / Atom feed
From: Adam Beneschan <adam@irvine.com>
Subject: Re: Why no Ada.Wide_Directories?
Date: Mon, 17 Oct 2011 18:10:35 -0700 (PDT)
Date: 2011-10-17T18:10:35-07:00	[thread overview]
Message-ID: <ffeeb5d0-5685-42ff-a141-72bea410f239@u10g2000prl.googlegroups.com> (raw)
In-Reply-To: 7156122c-b63f-487e-ad1b-0edcc6694a7a@u10g2000prl.googlegroups.com

On Oct 17, 4:47 pm, ytomino <aghi...@gmail.com> wrote:
> On Oct 18, 6:33 am, "Randy Brukardt" <ra...@rrsoftware.com> wrote:
>
>
>
>
>
>
>
> > Say what?
>
> > Ada.Strings.Encoding (new in Ada 2012) uses a subtype of String to store
> > UTF-8 encoded strings. As such, I'd find it pretty surprising if doing so
> > was "a violation of the standard".
>
> > The intent has always been that Open, Ada.Directories, etc. take UTF-8
> > strings as an option. Presumably the implementation would use a Form to
> > specify that the file names in UTF-8 form rather than Latin-1. (I wasn't
> > able to find a reference for this in a quick search, but I know it has been
> > talked about on several occasions.)
>
> > One of the primary reasons that Ada.Strings.Encoding uses a subtype of
> > String rather than a separate type is so that it can be passed to Open and
> > the like.
>
> > It's probably true that we should standardize on the Form needed to use
> > UTF-8 strings in these contexts, or at least come up with Implementation
> > Advice on that point.
>
> >                                        Randy.
>
> Good news. Thanks for letting know.
> My worry is decreased a little.
>
> However, even if that is right, Form parameters are missing for many
> subprograms.
> Probably, All subprograms in Ada.Directories,
> Ada.Directories.Hierarchical_File_Names, Ada.Command_Line,
> Ada.Environment_Variables and other subprograms having Name parameter
> or returning a file name should have Form parameter.
> (For example, I do Open (X, Form => "UTF-8"). Which does Name (X)
> returns UTF-8 or Latin-1?)
>
> Moreover, in the future, we will always use I/O subprograms as UTF-8
> mode if what you say is realized.
> But other libraries in the standard are explicitly defined as Latin-1.
> It's certain that Ada.Character.Handling.To_Upper breaks UTF-8.

I have a feeling you're fundamentally confused about what UTF-8 is, as
compared to "Latin-1".  Latin-1 is a character mapping.  It defines,
for all integers in the range 0..255, what character that integer
represents (e.g. 77 represents 'M', etc.).  Unicode is a character
mapping that defines characters for a much larger integer range.  For
integers in the range 0..255, the character represented in Unicode is
the same as that in Latin-1; higher integers represent characters in
other alphabets, other symbols, etc.  Those mappings just tell you
what symbols go with what numbers, and they don't say anything about
how the numbers are supposed to be stored.

UTF-8 is an encoding (representation).  It defines, for each non-
negative integer up to a certain point, what bits are used to
represent that integer.  The number of bits is not fixed.  So even if
you're working with characters all in the 0..255 range, some of those
characters will be represented in 8 bits (one byte) and some will take
16 bits (two bytes).

Because of this, it is not feasible to work with strings or characters
in UTF-8 encoding.  Suppose you declare a string

   S : String (1 .. 100);

but you want it to be a UTF-8 string.  How would that work?  If you
want to look at S(50), the computer would have to start at the
beginning of the string and figure out whether each character is
represented as 1 or 2 bytes.  Nobody wants that.

The only sane way to work with strings in memory is to use a format
where every character is the same size (String if all your characters
are in the 0..255 range, Wide_String for 0..65535, Wide_Wide_String
for 0..2**32-1).  Then, if you have a string of bytes in UTF-8 format,
you convert it to a regular (Wide_)(Wide_)String with routines in
Ada.Strings.UTF_Encoding; and it also has routines for converting
regular strings to UTF-8 format.  But you don't want to *keep* strings
in memory and work with them in UTF-8 format.  That's why it doesn't
make sense to have string routines (like
Ada.Strings.Equal_Case_Insensitive or Ada.Character_Handling.To_Upper)
that work with UTF-8.

Hope this solves your problem.

                             -- Adam



  reply	other threads:[~2011-10-18  1:18 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-14  6:58 Why no Ada.Wide_Directories? Michael Rohan
2011-10-14  7:39 ` Yannick Duchêne (Hibou57)
2011-10-14  9:07   ` Dmitry A. Kazakov
2011-10-14 12:48     ` Yannick Duchêne (Hibou57)
2011-10-14 12:54     ` Yannick Duchêne (Hibou57)
2011-10-15  1:06 ` ytomino
2011-10-15  6:55   ` Vadim Godunko
2011-10-15 12:34     ` ytomino
2011-10-15  8:38   ` Dmitry A. Kazakov
2011-10-15 13:12     ` Peter C. Chapin
2011-10-15 13:22       ` Ludovic Brenta
2011-10-15 14:47       ` Dmitry A. Kazakov
2011-10-16  5:48         ` Yannick Duchêne (Hibou57)
2011-10-17  0:15         ` Peter C. Chapin
2011-10-17  3:23           ` Yannick Duchêne (Hibou57)
2011-10-17  7:12           ` Simon Wright
2011-10-17  7:59           ` Dmitry A. Kazakov
2011-10-18 10:55             ` Peter C. Chapin
2011-10-18 12:27               ` Dmitry A. Kazakov
2011-10-16  5:51       ` Yannick Duchêne (Hibou57)
2011-10-17 21:41         ` Randy Brukardt
2011-10-18  7:29           ` Dmitry A. Kazakov
2011-10-18 14:06           ` Pascal Obry
2011-10-18 14:08             ` Pascal Obry
2011-10-19 21:32             ` Randy Brukardt
2011-10-17 21:33   ` Randy Brukardt
2011-10-17 23:47     ` ytomino
2011-10-18  1:10       ` Adam Beneschan [this message]
2011-10-18  2:32         ` ytomino
2011-10-18  4:46           ` ytomino
2011-10-18  9:32             ` Yannick Duchêne (Hibou57)
2011-10-18 10:00               ` Dmitry A. Kazakov
2011-10-18 10:06                 ` Yannick Duchêne (Hibou57)
2011-10-18 12:01                   ` Dmitry A. Kazakov
2011-10-18 15:02           ` Adam Beneschan
2011-10-18 15:16             ` Dmitry A. Kazakov
2011-10-18 23:42               ` Adam Beneschan
2011-10-19  8:12                 ` Dmitry A. Kazakov
2011-10-19 21:43               ` Randy Brukardt
2011-10-20  7:37                 ` Dmitry A. Kazakov
2011-10-20 11:04                   ` Yannick Duchêne (Hibou57)
2011-10-20 12:21                     ` Dmitry A. Kazakov
2011-10-20 12:38                       ` Yannick Duchêne (Hibou57)
2011-10-20 14:31                         ` Dmitry A. Kazakov
2011-10-20 15:54                           ` Yannick Duchêne (Hibou57)
2011-10-20 17:35                             ` Dmitry A. Kazakov
2011-10-21 12:53                               ` Yannick Duchêne (Hibou57)
2011-10-21 13:41                                 ` Dmitry A. Kazakov
2011-10-25 19:22                                   ` Randy Brukardt
2011-10-25 19:35                                     ` Dmitry A. Kazakov
2011-10-26 22:41                                       ` Randy Brukardt
2011-10-27  7:43                                         ` Dmitry A. Kazakov
2011-10-27 15:13                                           ` Yannick Duchêne (Hibou57)
2011-10-27 19:39                                             ` Robert A Duff
2011-10-27 21:09                                               ` Yannick Duchêne (Hibou57)
2011-10-28  7:50                                                 ` Dmitry A. Kazakov
2011-10-28  8:45                                                   ` Yannick Duchêne (Hibou57)
2011-10-28 14:59                                                     ` Dmitry A. Kazakov
2011-10-20 17:40                   ` J-P. Rosen
2011-10-20 18:43                     ` Dmitry A. Kazakov
2011-10-21 10:07                     ` Vadim Godunko
2011-10-21 11:25                       ` J-P. Rosen
2011-10-21 12:25                         ` Yannick Duchêne (Hibou57)
2011-10-21 13:13                         ` Dmitry A. Kazakov
2011-10-21 16:03                           ` Yannick Duchêne (Hibou57)
2011-10-21 18:34                             ` Dmitry A. Kazakov
2011-10-21 19:30                               ` Yannick Duchêne (Hibou57)
2011-10-21 20:02                                 ` Dmitry A. Kazakov
2011-10-21 20:36                                   ` Yannick Duchêne (Hibou57)
2011-10-22  7:54                                     ` Dmitry A. Kazakov
2011-10-22 20:28                                       ` Yannick Duchêne (Hibou57)
2011-10-22 22:23                                       ` Yannick Duchêne (Hibou57)
2011-10-23  7:53                                         ` Dmitry A. Kazakov
2011-10-25 19:16                                           ` Randy Brukardt
2011-10-21 18:55                         ` Vadim Godunko
2011-10-21 19:18                           ` J-P. Rosen
2011-10-21 19:41                           ` Yannick Duchêne (Hibou57)
2011-10-18 22:54             ` ytomino
2011-10-18  3:15         ` Yannick Duchêne (Hibou57)
2011-10-18  7:55         ` Dmitry A. Kazakov
2011-10-18  9:41           ` Yannick Duchêne (Hibou57)
2011-10-18 10:25           ` J-P. Rosen
2011-10-18 10:56             ` Yannick Duchêne (Hibou57)
2011-10-18 15:34           ` Adam Beneschan
2011-10-18 17:27             ` J-P. Rosen
2011-10-18 18:33               ` Adam Beneschan
2011-10-18 19:54               ` Yannick Duchêne (Hibou57)
2011-10-18  8:01       ` Dmitry A. Kazakov
2011-10-18  2:59     ` Yannick Duchêne (Hibou57)
2011-10-18  4:07       ` Michael Rohan
2011-10-18  4:54       ` ytomino
2011-10-18  9:54         ` Yannick Duchêne (Hibou57)
2011-10-18 10:52           ` ytomino
2011-10-18 11:02             ` Yannick Duchêne (Hibou57)
2011-10-18 21:18               ` ytomino
2011-10-18 10:10       ` J-P. Rosen
2011-10-22  6:32         ` Michael Rohan
2011-10-22  7:25           ` Yannick Duchêne (Hibou57)
2011-10-25 19:26           ` Randy Brukardt
2011-10-27 17:40 ` anon
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox