From: "G.B." <bauhaus@futureapps.invalid>
Subject: Re: Bug in Ada - Latin 1 is not a subset of UTF-8
Date: Wed, 19 Oct 2016 16:20:12 +0200
Date: 2016-10-19T16:20:12+02:00 [thread overview]
Message-ID: <nu7ve9$r44$1@dont-email.me> (raw)
In-Reply-To: <nu7c29$1dsi$1@gioia.aioe.org>
On 19.10.16 10:49, Dmitry A. Kazakov wrote:
>> The misconception is to think that String is meant to be
>> Latin-1 String. String isn't Latin-1 String. Ada states
>> a *correspondence*, but no essence at all.
>
> 3.5.2
>
> "The predefined type Character is a character type whose values
> correspond to the 256 code positions of Row 00 (also known as Latin-1)
^^^^^^^^^^
> of the ISO/IEC 10646:2003 Basic Multilingual Plane (BMP)."
Exactly, it means, values aren't Latin-1, they correspond
to Latin-1 code points. (To be /= To correspond to.)
>>>> To get a subset U from a set S, you apply a constraint
>>>> to S. That's not (easily) expressible in Ada in this case.
>>>
>>> There is no such constraint at all. A-umlaut in Latin-1 is one
>>> character, in UTF-8 it is two characters.
A-Umlaut is a character, not a character-in-Some-Encoding-Form.
'€' is one, too, as are the four in "Łódź" that the man named
"Artiñano" (8 characters) could not manage to type into his letter
without accidentally spoiling his last name.
>> In Ada, A-Umlaut is not a character in Latin-1,
>
> It is. ISO/IEC 8859-1
For Ada, A-Umlaut is ("essence" vs "correspondence") not
a character in ISO/IEC 8859-1, but there exist correspondences
between A-Umlaut and the Ada Character and ISO/IEC 8859-1.
And we "cannot do it in a typed way, that is the whole point".
> it is merely a deficiency of Ada type system to do it properly.
> We cannot do it with generics or constrained subtypes, so we drop typing
> to have at least something.
Ada can add a constraining aspect to a type derived from String
so as to formally specify the set of values in that type.
In a way similar to
type US_Elevator is new Integer range -10 .. 500
with
Static_Predicate => US_Elevator /= 13;
The short, informal name of that computable, exact specification
by a Predicate for the former type derived from String is "UTF-8".
It gives one-way substitutability: you can use a value of the
derived type wherever you can use a value of type String, if
there ever is a need for doing so (e.g. dumb String'Write can be
reused after Convert-ing to UTF_8_String (encoding)).
next prev parent reply other threads:[~2016-10-19 14:20 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-17 20:18 Bug in Ada - Latin 1 is not a subset of UTF-8 Lucretia
2016-10-17 20:57 ` Jacob Sparre Andersen
2016-10-18 5:44 ` J-P. Rosen
2016-10-17 23:25 ` G.B.
2016-10-18 7:41 ` Dmitry A. Kazakov
2016-10-18 8:23 ` G.B.
2016-10-18 8:45 ` Dmitry A. Kazakov
2016-10-18 10:09 ` G.B.
2016-10-18 12:24 ` Dmitry A. Kazakov
2016-10-18 15:10 ` G.B.
2016-10-18 16:35 ` Dmitry A. Kazakov
2016-10-18 17:35 ` G.B.
2016-10-18 20:03 ` Dmitry A. Kazakov
2016-10-19 8:15 ` G.B.
2016-10-19 8:25 ` G.B.
2016-10-19 8:49 ` Dmitry A. Kazakov
2016-10-19 14:20 ` G.B. [this message]
2016-10-19 16:20 ` Dmitry A. Kazakov
2016-10-20 0:31 ` Randy Brukardt
2016-10-20 7:36 ` Dmitry A. Kazakov
2016-10-21 12:28 ` G.B.
2016-10-21 16:13 ` Lucretia
2016-10-21 16:43 ` Dmitry A. Kazakov
2016-10-22 5:51 ` G.B.
2016-10-22 7:49 ` Dmitry A. Kazakov
2016-10-24 11:35 ` Luke A. Guest
2016-10-24 13:01 ` Dmitry A. Kazakov
2016-10-24 14:54 ` Luke A. Guest
2016-10-22 1:53 ` Randy Brukardt
2016-10-28 21:08 ` Shark8
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox