From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: Bug in Ada - Latin 1 is not a subset of UTF-8
Date: Wed, 19 Oct 2016 10:49:14 +0200
Date: 2016-10-19T10:49:14+02:00 [thread overview]
Message-ID: <nu7c29$1dsi$1@gioia.aioe.org> (raw)
In-Reply-To: nu7a3b$i94$1@dont-email.me
On 19/10/2016 10:15, G.B. wrote:
> On 18.10.16 22:03, Dmitry A. Kazakov wrote:
>> On 2016-10-18 19:35, G.B. wrote:
>>> On 18.10.16 18:35, Dmitry A. Kazakov wrote:
>>>> No invariant can make Latin-1 A-umlaut UTF-8 A-umlaut.
>>>
>>> Who would ever want to do that?
>>
>> Somebody claiming that UTF-8 string is a constrained subtype of
>> Latin-1 string.
>
> But I do not claim this!
>
> The misconception is to think that String is meant to be
> Latin-1 String. String isn't Latin-1 String. Ada states
> a *correspondence*, but no essence at all.
3.5.2
"The predefined type Character is a character type whose values
correspond to the 256 code positions of Row 00 (also known as Latin-1)
of the ISO/IEC 10646:2003 Basic Multilingual Plane (BMP)."
String means Latin-1. You can use it as if it meant something else, e.g.
UTF-8 string or UCS-2 string or PDP-11 machine code. That would prove
nothing except your willingness to go untyped.
> In fact, reading Japanese, or Polish, or Hebrew text would
> be impossible to do in Ada if String was Latin-1!
Polish alphabet is Latin based, BTW.
Yes, you need to break the type system in order to re-interpret String
as a UTF-8 string. You cannot do it in a typed way, that is the whole
point. Latin-1 and UTF-8 strings are not subtypes unless you break
types. Once you did it does not make any sense to talk about subtypes
anymore. Subtype presumes keeping if not all (LSP subtype) but some of
vital properties. Re-interpreted Latin-1 to UTF-8 strings keep almost
none of string properties.
>>> To get a subset U from a set S, you apply a constraint
>>> to S. That's not (easily) expressible in Ada in this case.
>>
>> There is no such constraint at all. A-umlaut in Latin-1 is one
>> character, in UTF-8 it is two characters.
>
> In Ada, A-Umlaut is not a character in Latin-1,
It is. ISO/IEC 8859-1
> Where we would be needing conversion, were Ada to have
> types for character sets and so on, we now have operations
> such as Encode, Decode, and Convert.
Yep, Ada goes untyped mess. Again, it is not an ill will to make C out
of Ada, it is merely a deficiency of Ada type system to do it properly.
We cannot do it with generics or constrained subtypes, so we drop typing
to have at least something.
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
next prev parent reply other threads:[~2016-10-19 8:49 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-17 20:18 Bug in Ada - Latin 1 is not a subset of UTF-8 Lucretia
2016-10-17 20:57 ` Jacob Sparre Andersen
2016-10-18 5:44 ` J-P. Rosen
2016-10-17 23:25 ` G.B.
2016-10-18 7:41 ` Dmitry A. Kazakov
2016-10-18 8:23 ` G.B.
2016-10-18 8:45 ` Dmitry A. Kazakov
2016-10-18 10:09 ` G.B.
2016-10-18 12:24 ` Dmitry A. Kazakov
2016-10-18 15:10 ` G.B.
2016-10-18 16:35 ` Dmitry A. Kazakov
2016-10-18 17:35 ` G.B.
2016-10-18 20:03 ` Dmitry A. Kazakov
2016-10-19 8:15 ` G.B.
2016-10-19 8:25 ` G.B.
2016-10-19 8:49 ` Dmitry A. Kazakov [this message]
2016-10-19 14:20 ` G.B.
2016-10-19 16:20 ` Dmitry A. Kazakov
2016-10-20 0:31 ` Randy Brukardt
2016-10-20 7:36 ` Dmitry A. Kazakov
2016-10-21 12:28 ` G.B.
2016-10-21 16:13 ` Lucretia
2016-10-21 16:43 ` Dmitry A. Kazakov
2016-10-22 5:51 ` G.B.
2016-10-22 7:49 ` Dmitry A. Kazakov
2016-10-24 11:35 ` Luke A. Guest
2016-10-24 13:01 ` Dmitry A. Kazakov
2016-10-24 14:54 ` Luke A. Guest
2016-10-22 1:53 ` Randy Brukardt
2016-10-28 21:08 ` Shark8
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox