From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: "G.B." <bauhaus@futureapps.invalid>
Newsgroups: comp.lang.ada
Subject: Re: Bug in Ada - Latin 1 is not a subset of UTF-8
Date: Wed, 19 Oct 2016 10:15:58 +0200
Organization: A noiseless patient Spider
Message-ID: <nu7a3b$i94$1@dont-email.me>
References: <86f0d2fe-d498-4bc4-bb9d-e34629c89bb4@googlegroups.com>
 <nu3mkc$agg$1@dont-email.me> <nu4jnj$11va$1@gioia.aioe.org>
 <nu4m5k$g7g$1@dont-email.me> <nu4nee$18le$1@gioia.aioe.org>
 <nu4sbm$4m3$1@dont-email.me> <nu54af$1oo$1@gioia.aioe.org>
 <nu5e0p$54t$1@dont-email.me> <nu5j0s$sch$1@gioia.aioe.org>
 <nu5mgi$7er$1@dont-email.me> <nu5v60$1h81$1@gioia.aioe.org>
Reply-To: nonlegitur@futureapps.de
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 19 Oct 2016 08:15:39 -0000 (UTC)
Injection-Info: mx02.eternal-september.org;
 posting-host="dbaaa14e6bed1d902e348fb8e2991c6b";
	logging-data="18724"; mail-complaints-to="abuse@eternal-september.org";
	posting-account="U2FsdGVkX1/v+hct2uLyQWpT+y/YZPgdbTQvZaUAX+0="
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:45.0)
 Gecko/20100101 Thunderbird/45.4.0
In-Reply-To: <nu5v60$1h81$1@gioia.aioe.org>
Cancel-Lock: sha1:5IzMu9BsF3E7uMHpbfgMDZFal8k=
Xref: news.eternal-september.org comp.lang.ada:32131
Date: 2016-10-19T10:15:58+02:00
List-Id: <comp.lang.ada>

On 18.10.16 22:03, Dmitry A. Kazakov wrote:
> On 2016-10-18 19:35, G.B. wrote:
>> On 18.10.16 18:35, Dmitry A. Kazakov wrote:
>>> No invariant can make Latin-1 A-umlaut UTF-8 A-umlaut.
>>
>> Who would ever want to do that?
>
> Somebody claiming that UTF-8 string is a constrained subtype of Latin-1 string.

But I do not claim this!

The misconception is to think that String is meant to be
Latin-1 String. String isn't Latin-1 String. Ada states
a *correspondence*, but no essence at all.

In fact, reading Japanese, or Polish, or Hebrew text would
be impossible to do in Ada if String was Latin-1!

Yes, character sets in Ada do not have types.

>> To get a subset U from a set S, you apply a constraint
>> to S. That's not (easily) expressible in Ada in this case.
>
> There is no such constraint at all. A-umlaut in Latin-1 is one character, in UTF-8 it is two characters.

In Ada, A-Umlaut is not a character in Latin-1,
In Ada, A-Umlaut is not a character in UTF-8.

Reason: Latin-1 and UTF-8 describe encoded forms, as do
KOI8-R, ISO-8859-15, Shift_JIS, or CP 1252. Some only
happen to list, and some only indicate a repertoire of
corresponding characters also.

A-Umlaut is a character, lower case C.

> To introduce a subtype relationship we need a conversion, not a constraint. Ada does not support this method of subtype construction.

An Ada-subtype relationship is designed to avoid conversion,
And so it is distinguishable by its constraint, and its name,
only.

Where we would be needing conversion, were Ada to have
types for character sets and so on, we now have operations
such as Encode, Decode, and Convert. Together with statements
of correspondence and normative reference in the RM.

But both do not prevent identifying a subset of valid values
of dumb type String that constitute the subset of UTF_8_String.
Or that of a to-be-defined (trivial) subtype Latin_1_String.

    type Latin_String is String;
    --   RM blah blah ...

    type Latin_1_String is String;


-- 
"HOTDOGS ARE NOT BOOKMARKS"
Springfield Elementary teaching staff