comp.lang.ada
 help / color / mirror / Atom feed
From: "G.B." <bauhaus@futureapps.invalid>
Subject: Re: Bug in Ada - Latin 1 is not a subset of UTF-8
Date: Wed, 19 Oct 2016 10:15:58 +0200
Date: 2016-10-19T10:15:58+02:00	[thread overview]
Message-ID: <nu7a3b$i94$1@dont-email.me> (raw)
In-Reply-To: <nu5v60$1h81$1@gioia.aioe.org>

On 18.10.16 22:03, Dmitry A. Kazakov wrote:
> On 2016-10-18 19:35, G.B. wrote:
>> On 18.10.16 18:35, Dmitry A. Kazakov wrote:
>>> No invariant can make Latin-1 A-umlaut UTF-8 A-umlaut.
>>
>> Who would ever want to do that?
>
> Somebody claiming that UTF-8 string is a constrained subtype of Latin-1 string.

But I do not claim this!

The misconception is to think that String is meant to be
Latin-1 String. String isn't Latin-1 String. Ada states
a *correspondence*, but no essence at all.

In fact, reading Japanese, or Polish, or Hebrew text would
be impossible to do in Ada if String was Latin-1!

Yes, character sets in Ada do not have types.

>> To get a subset U from a set S, you apply a constraint
>> to S. That's not (easily) expressible in Ada in this case.
>
> There is no such constraint at all. A-umlaut in Latin-1 is one character, in UTF-8 it is two characters.

In Ada, A-Umlaut is not a character in Latin-1,
In Ada, A-Umlaut is not a character in UTF-8.

Reason: Latin-1 and UTF-8 describe encoded forms, as do
KOI8-R, ISO-8859-15, Shift_JIS, or CP 1252. Some only
happen to list, and some only indicate a repertoire of
corresponding characters also.

A-Umlaut is a character, lower case C.

> To introduce a subtype relationship we need a conversion, not a constraint. Ada does not support this method of subtype construction.

An Ada-subtype relationship is designed to avoid conversion,
And so it is distinguishable by its constraint, and its name,
only.

Where we would be needing conversion, were Ada to have
types for character sets and so on, we now have operations
such as Encode, Decode, and Convert. Together with statements
of correspondence and normative reference in the RM.

But both do not prevent identifying a subset of valid values
of dumb type String that constitute the subset of UTF_8_String.
Or that of a to-be-defined (trivial) subtype Latin_1_String.

    type Latin_String is String;
    --   RM blah blah ...

    type Latin_1_String is String;



-- 
"HOTDOGS ARE NOT BOOKMARKS"
Springfield Elementary teaching staff


  reply	other threads:[~2016-10-19  8:15 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-17 20:18 Bug in Ada - Latin 1 is not a subset of UTF-8 Lucretia
2016-10-17 20:57 ` Jacob Sparre Andersen
2016-10-18  5:44   ` J-P. Rosen
2016-10-17 23:25 ` G.B.
2016-10-18  7:41   ` Dmitry A. Kazakov
2016-10-18  8:23     ` G.B.
2016-10-18  8:45       ` Dmitry A. Kazakov
2016-10-18 10:09         ` G.B.
2016-10-18 12:24           ` Dmitry A. Kazakov
2016-10-18 15:10             ` G.B.
2016-10-18 16:35               ` Dmitry A. Kazakov
2016-10-18 17:35                 ` G.B.
2016-10-18 20:03                   ` Dmitry A. Kazakov
2016-10-19  8:15                     ` G.B. [this message]
2016-10-19  8:25                       ` G.B.
2016-10-19  8:49                       ` Dmitry A. Kazakov
2016-10-19 14:20                         ` G.B.
2016-10-19 16:20                           ` Dmitry A. Kazakov
2016-10-20  0:31         ` Randy Brukardt
2016-10-20  7:36           ` Dmitry A. Kazakov
2016-10-21 12:28             ` G.B.
2016-10-21 16:13               ` Lucretia
2016-10-21 16:43                 ` Dmitry A. Kazakov
2016-10-22  5:51                   ` G.B.
2016-10-22  7:49                     ` Dmitry A. Kazakov
2016-10-24 11:35                       ` Luke A. Guest
2016-10-24 13:01                         ` Dmitry A. Kazakov
2016-10-24 14:54                           ` Luke A. Guest
2016-10-22  1:53             ` Randy Brukardt
2016-10-28 21:08         ` Shark8
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox