comp.lang.ada
 help / color / mirror / Atom feed
From: "Dan'l Miller" <optikos@verizon.net>
Subject: Re: Strange crash on custom iterator
Date: Wed, 4 Jul 2018 07:43:24 -0700 (PDT)
Date: 2018-07-04T07:43:24-07:00	[thread overview]
Message-ID: <5611f9a5-508b-4846-9d53-4a05599f7f53@googlegroups.com> (raw)
In-Reply-To: <d35454dc-f982-49d7-b727-45a9cc69822b@googlegroups.com>

On Wednesday, July 4, 2018 at 9:37:40 AM UTC-5, Dan'l Miller wrote:
> On Wednesday, July 4, 2018 at 8:27:53 AM UTC-5, Dmitry A. Kazakov wrote:
> > On 2018-07-04 13:30, J-P. Rosen wrote:
> > > Le 04/07/2018 à 12:01, Dmitry A. Kazakov a écrit :
> > >> But UTF-8 is actually more efficient in most cases than
> > >> Wide_Wide_String. Random string indexing is practically never used.
> > > !!!! I, and many others, often need to search substrings within a
> > > string; actually, I would have a hard time finding an example of string
> > > manipulation without indexing...
> > > 
> > >>> We discussed that point, and the agreement was that making a different
> > >>> type would force the user to many conversions that would bring nothing
> > >>> but trouble, and make Ada once again look impractical out of excessive
> > >>> purism.
> > >>
> > >> Exactly my point. Explicit conversion are necessary because Ada's type
> > >> system is unable to model strings in a type-safe way.
> > > So, you want different types, plus a typing system that would allow to
> > > mix the types and make them compatible.
> > 
> > Yes, because they are semantically same: arrays of code points.
> > 
> > > .. You might as well put
> > > everything in the same type!
> > 
> > No, because they must have different representations.
> > 
> > > Anyway, the ARG has to deal with Ada as it is, not as Dmitry dreams it
> > > should be...
> > 
> > It requires someone more influential, wise and knowledgeable than me to 
> > make and then push such a proposal. I would be satisfied if more people 
> > saw the roots of problems with strings etc.
> 
> I think that perhaps /all/ readers of this see at least one •problem• with UTF-8 (and perhaps Unicode/ISO10646 in general in Ada, regardless of choice of encoding) in Ada's String (and perhaps Wide_String and Wide_Wide_String too).
> 
> The difficulty is that •no one• has the single •solution• for this problem or these concomitant problems.  Not even J-P. Rosen is a possessor of complete solution in his Wide_Wide_String recommendation, because his replies seem to factually-incorrectly imply that there exists a fully-normalized single-codepoint character in Unicode/ISO10646 for each grapheme/letter.  The following article provides 7 examples in 4 languages (2 of which are European languages, no less!) where a single grapheme's most-compact representation in Unicode/ISO10646 is a multi-codepoint sequence.
> 
> The absolutely most infamous of these 7 examples is the Lithuanian one.  Because through flukes of sociopolitical history, Vietnamese, French, German, and so forth all had pre-1992 ISO standards or IBM-Microsoft-Apple code-pages for their letters with diacritics, their languages' letters with diacritics got standardized in Unicode/ISO10646 as single codepoints, e.g., ü as U+FC instead of ¨ U+308 followed by u U+75.  Poor old Lithuania was under Soviet occupation from 1944 to 1991, during which the Soviets tried to suppress the Lithuanian language.  Due to this suppression, the Soviet character-encoding standards never standardized encodings for Lithuanian letters with all the Lithuanian-specific diacritical marks, such as the 2 example letters given in the article linked above.  Because the timespan was so short from the Soviet occupation leaving Lithuania in 1991 to the 1992 cut-off of pre-existing character-encoding standards to which Unicode/ISO10646 must be encode as single codepoints, poor old Lithuanian characters are 2nd-class citizens in Unicode/ISO10646, whereas all the Western European languages (and their former colonies) with diacritical marks are first-class citizens in Unicode/ISO10646.  This is a cause of somewhat of a protracted slow-motion multidecade trench warfare between Lithuania and Unicode/ISO10646 over this issue, made worse every time someone elsewhere on the planet whips up a brand-new character-with-single-codepoint that has never ever existed in the history of humankind and then standardizes this brand-new contrived grapheme-with-single-codepoint in Unicode/ISO10646.
> 
> Oh, but Japan and Silicon Valley can devise emojis galore in recent years and not be restricted by strict enforcement of this no-preexisting-character-encoding rule.  Why?  I guess because emojis are cool, but Lithuanian characters are booooorrrrrrrring.

Oh, it would help if I would press the paste key:
http://unicode.org/standard/where


  reply	other threads:[~2018-07-04 14:43 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-30 10:48 Strange crash on custom iterator Lucretia
2018-06-30 11:32 ` Simon Wright
2018-06-30 12:02   ` Lucretia
2018-06-30 14:25     ` Simon Wright
2018-06-30 14:33       ` Lucretia
2018-06-30 19:25         ` Simon Wright
2018-06-30 19:36           ` Luke A. Guest
2018-07-01 18:06             ` Jacob Sparre Andersen
2018-07-01 19:59               ` Simon Wright
2018-07-02 17:43                 ` Luke A. Guest
2018-07-02 19:42                   ` Simon Wright
2018-07-03 14:08                     ` Lucretia
2018-07-03 14:17                       ` J-P. Rosen
2018-07-03 15:06                         ` Lucretia
2018-07-03 15:45                           ` J-P. Rosen
2018-07-03 15:55                             ` Lucretia
2018-07-03 17:00                               ` J-P. Rosen
2018-07-03 15:57                             ` Dmitry A. Kazakov
2018-07-03 16:07                               ` Lucretia
2018-07-03 16:36                                 ` Dmitry A. Kazakov
2018-07-03 16:42                                   ` Lucretia
2018-07-03 16:45                                     ` Lucretia
2018-07-03 20:18                                     ` Dmitry A. Kazakov
2018-07-03 21:04                                       ` Lucretia
2018-07-04  1:26                                         ` Dan'l Miller
2018-07-04  1:59                                           ` Lucretia
2018-07-04  7:37                                             ` Dmitry A. Kazakov
2018-07-04 12:46                                             ` Dan'l Miller
2018-07-04 13:37                                             ` Dennis Lee Bieber
2018-07-04  7:21                                         ` Dmitry A. Kazakov
2018-07-03 18:54                                   ` Dan'l Miller
2018-07-03 20:22                                     ` Dmitry A. Kazakov
2018-07-04  7:33                                   ` J-P. Rosen
2018-07-04  7:53                                     ` Dmitry A. Kazakov
2018-07-04  9:55                                       ` J-P. Rosen
2018-07-04 10:01                                         ` Dmitry A. Kazakov
2018-07-04 11:30                                           ` J-P. Rosen
2018-07-04 13:27                                             ` Dmitry A. Kazakov
2018-07-04 14:37                                               ` Dan'l Miller
2018-07-04 14:43                                                 ` Dan'l Miller [this message]
2018-07-04 14:57                                                 ` J-P. Rosen
2018-07-04 15:41                                                 ` Lucretia
2018-07-04 16:55                                                   ` Dan'l Miller
2018-07-04 18:01                                                     ` Shark8
2018-07-04 18:57                                                       ` Dmitry A. Kazakov
2018-07-04 19:53                                                         ` Shark8
2018-07-04 20:05                                                           ` Lucretia
2018-07-04 22:04                                                             ` Shark8
2018-07-05  0:12                                                               ` Dan'l Miller
2018-07-05  1:46                                                                 ` Shark8
2018-07-05  2:07                                                                   ` Luke A. Guest
2018-07-05 16:47                                                                     ` Shark8
2018-07-05 17:19                                                                       ` Dan'l Miller
2018-07-05 19:14                                                                         ` Shark8
2018-07-04 20:43                                                           ` Dmitry A. Kazakov
2018-07-04 17:51                                             ` Jacob Sparre Andersen
2018-07-04 18:06                                               ` Shark8
2018-07-04 18:59                                                 ` Dan'l Miller
2018-07-04 19:01                                                 ` Dmitry A. Kazakov
2018-07-05 18:08                                                   ` Randy Brukardt
2018-07-05 19:41                                                     ` Dmitry A. Kazakov
2018-07-04 21:00                                                 ` Jacob Sparre Andersen
2018-07-05 18:06                                               ` Randy Brukardt
2018-07-04 19:02                                       ` G. B.
2018-07-04 19:16                                         ` Dmitry A. Kazakov
2018-07-04 20:40                                           ` G. B.
2018-07-04 20:55                                             ` Dmitry A. Kazakov
2018-07-04 21:21                                               ` G.B.
2018-07-05  7:55                                                 ` Dmitry A. Kazakov
2018-07-06  8:28                                                   ` G.B.
2018-07-06  8:57                                                     ` Dmitry A. Kazakov
2018-07-02  8:31               ` Lucretia
2018-06-30 14:34       ` Lucretia
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox