comp.lang.ada
 help / color / mirror / Atom feed
From: "G.B." <bauhaus@futureapps.invalid>
Subject: Re: Bug in Ada - Latin 1 is not a subset of UTF-8
Date: Wed, 19 Oct 2016 16:20:12 +0200
Date: 2016-10-19T16:20:12+02:00	[thread overview]
Message-ID: <nu7ve9$r44$1@dont-email.me> (raw)
In-Reply-To: <nu7c29$1dsi$1@gioia.aioe.org>

On 19.10.16 10:49, Dmitry A. Kazakov wrote:

>> The misconception is to think that String is meant to be
>> Latin-1 String. String isn't Latin-1 String. Ada states
>> a *correspondence*, but no essence at all.
>
> 3.5.2
>
> "The predefined type Character is a character type whose values
> correspond to the 256 code positions of Row 00 (also known as Latin-1)
   ^^^^^^^^^^
> of the ISO/IEC 10646:2003 Basic Multilingual Plane (BMP)."

Exactly, it means, values aren't Latin-1, they correspond
to Latin-1 code points. (To be /= To correspond to.)


>>>> To get a subset U from a set S, you apply a constraint
>>>> to S. That's not (easily) expressible in Ada in this case.
>>>
>>> There is no such constraint at all. A-umlaut in Latin-1 is one
>>> character, in UTF-8 it is two characters.

A-Umlaut is a character, not a character-in-Some-Encoding-Form.
'€' is one, too, as are the four in "Łódź" that the man named
"Artiñano" (8 characters) could not manage to type into his letter
without accidentally spoiling his last name.


>> In Ada, A-Umlaut is not a character in Latin-1,
>
> It is. ISO/IEC 8859-1

For Ada, A-Umlaut is ("essence" vs "correspondence") not
a character in ISO/IEC 8859-1, but there exist correspondences
between A-Umlaut and the Ada Character and ISO/IEC 8859-1.
And we "cannot do it in a typed way, that is the whole point".

> it is merely a deficiency of Ada type system to do it properly.
> We cannot do it with generics or constrained subtypes, so we drop typing
> to have at least something.

Ada can add a constraining aspect to a type derived from String
so as to formally specify the set of values in that type.
In a way similar to

   type US_Elevator is new Integer range -10 .. 500
      with
        Static_Predicate => US_Elevator /= 13;

The short, informal name of that computable, exact specification
by a Predicate for the former type derived from String is "UTF-8".

It gives one-way substitutability: you can use a value of the
derived type wherever you can use a value of type String, if
there ever is a need for doing so (e.g. dumb String'Write can be
reused after Convert-ing to UTF_8_String (encoding)).

  reply	other threads:[~2016-10-19 14:20 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-17 20:18 Bug in Ada - Latin 1 is not a subset of UTF-8 Lucretia
2016-10-17 20:57 ` Jacob Sparre Andersen
2016-10-18  5:44   ` J-P. Rosen
2016-10-17 23:25 ` G.B.
2016-10-18  7:41   ` Dmitry A. Kazakov
2016-10-18  8:23     ` G.B.
2016-10-18  8:45       ` Dmitry A. Kazakov
2016-10-18 10:09         ` G.B.
2016-10-18 12:24           ` Dmitry A. Kazakov
2016-10-18 15:10             ` G.B.
2016-10-18 16:35               ` Dmitry A. Kazakov
2016-10-18 17:35                 ` G.B.
2016-10-18 20:03                   ` Dmitry A. Kazakov
2016-10-19  8:15                     ` G.B.
2016-10-19  8:25                       ` G.B.
2016-10-19  8:49                       ` Dmitry A. Kazakov
2016-10-19 14:20                         ` G.B. [this message]
2016-10-19 16:20                           ` Dmitry A. Kazakov
2016-10-20  0:31         ` Randy Brukardt
2016-10-20  7:36           ` Dmitry A. Kazakov
2016-10-21 12:28             ` G.B.
2016-10-21 16:13               ` Lucretia
2016-10-21 16:43                 ` Dmitry A. Kazakov
2016-10-22  5:51                   ` G.B.
2016-10-22  7:49                     ` Dmitry A. Kazakov
2016-10-24 11:35                       ` Luke A. Guest
2016-10-24 13:01                         ` Dmitry A. Kazakov
2016-10-24 14:54                           ` Luke A. Guest
2016-10-22  1:53             ` Randy Brukardt
2016-10-28 21:08         ` Shark8
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox