From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: Bug in Ada - Latin 1 is not a subset of UTF-8
Date: Tue, 18 Oct 2016 18:35:42 +0200
Date: 2016-10-18T18:35:42+02:00 [thread overview]
Message-ID: <nu5j0s$sch$1@gioia.aioe.org> (raw)
In-Reply-To: nu5e0p$54t$1@dont-email.me
On 2016-10-18 17:10, G.B. wrote:
> On 18.10.16 14:24, Dmitry A. Kazakov wrote:
>
>>> still, any UTF-8 encoded "string" of UCS objects is wellformed
>>> and it satisfies a predicate that involves all components x, x', x'',
>>> ...
>>> of a UTF_8_String object, by stating that if x matches 2#10......#,
>>> then x' is such-and-such, and so on. I'm not sure this predicate
>>> is easily stated as a stand-alone type invariant, for example, but
>>> that's the idea. It shouldn't have to be visible to Ada programmers.
>>
>> Sorry, that is a meaningless set of words.
>
> Spelling out the look of model strings for type UTF_8_String
> cannot quite be meaningless.
>
> Ada Rationale:
>
> "Type invariants are designed for use with private types
> where we want some relationship to always hold between
> components of the type". (2.4)
That is completely irrelevant. No invariant can make Latin-1 A-umlaut
UTF-8 A-umlaut.
>> Type constraint is put on type values.
>
> (Type values or a type's values?)
Values of a type, E.g. Positive is constrained Integer.
> UTF_8_String does identify a subset of the values of
> type String, by intent,
No, it does not, that is why this implementation is broken. UTF-8
strings can be represented by String, they can be represented by Boolean
arrays or by indefinite integers or by polygons. That does not make a
them Boolean array subtype. No way.
>> Values of UTF-8 strings are not values of strings, as A-umlaut
>> promptly demonstrates. Period.
>
> Of course they can be (ASCII subset).
A-umlaut is not ASCII.
> But this is not otherwise reflected in the subtype, AFAICS.
> Considering
>
> UTF_String'("'Ä' is A-Umlaut");
>
> the literal, if taken at face value. doesn't say which characters
> there are going to be.
It does exactly this, once you define "character".
> It takes an Ada compiler to interpret the
> source text and decide whether it is representing Latin-1 or
> a multi-octed sequence, possibly one that needs Wide_Character
> or Wide_Wide_Character.
There is nothing to interpret considering literals of Universal_String.
It is no different from the way Universal_Integer is handled. String and
UTF-8 string and Wide string can be considered subtypes of
Universal_String, that does not have effect on the relationships between
String and UTF-8 string. Same if literals considered overloaded
functions. No different.
>> "Remainders are values ... in Character" makes no sense either.
>
> Character'Val (N rem 256);
So what? Numeric characters is still a constrained subtype of character
type.
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
next prev parent reply other threads:[~2016-10-18 16:35 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-17 20:18 Bug in Ada - Latin 1 is not a subset of UTF-8 Lucretia
2016-10-17 20:57 ` Jacob Sparre Andersen
2016-10-18 5:44 ` J-P. Rosen
2016-10-17 23:25 ` G.B.
2016-10-18 7:41 ` Dmitry A. Kazakov
2016-10-18 8:23 ` G.B.
2016-10-18 8:45 ` Dmitry A. Kazakov
2016-10-18 10:09 ` G.B.
2016-10-18 12:24 ` Dmitry A. Kazakov
2016-10-18 15:10 ` G.B.
2016-10-18 16:35 ` Dmitry A. Kazakov [this message]
2016-10-18 17:35 ` G.B.
2016-10-18 20:03 ` Dmitry A. Kazakov
2016-10-19 8:15 ` G.B.
2016-10-19 8:25 ` G.B.
2016-10-19 8:49 ` Dmitry A. Kazakov
2016-10-19 14:20 ` G.B.
2016-10-19 16:20 ` Dmitry A. Kazakov
2016-10-20 0:31 ` Randy Brukardt
2016-10-20 7:36 ` Dmitry A. Kazakov
2016-10-21 12:28 ` G.B.
2016-10-21 16:13 ` Lucretia
2016-10-21 16:43 ` Dmitry A. Kazakov
2016-10-22 5:51 ` G.B.
2016-10-22 7:49 ` Dmitry A. Kazakov
2016-10-24 11:35 ` Luke A. Guest
2016-10-24 13:01 ` Dmitry A. Kazakov
2016-10-24 14:54 ` Luke A. Guest
2016-10-22 1:53 ` Randy Brukardt
2016-10-28 21:08 ` Shark8
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox