comp.lang.ada
 help / color / mirror / Atom feed
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: Bug in Ada - Latin 1 is not a subset of UTF-8
Date: Wed, 19 Oct 2016 10:49:14 +0200
Date: 2016-10-19T10:49:14+02:00	[thread overview]
Message-ID: <nu7c29$1dsi$1@gioia.aioe.org> (raw)
In-Reply-To: nu7a3b$i94$1@dont-email.me

On 19/10/2016 10:15, G.B. wrote:
> On 18.10.16 22:03, Dmitry A. Kazakov wrote:
>> On 2016-10-18 19:35, G.B. wrote:
>>> On 18.10.16 18:35, Dmitry A. Kazakov wrote:
>>>> No invariant can make Latin-1 A-umlaut UTF-8 A-umlaut.
>>>
>>> Who would ever want to do that?
>>
>> Somebody claiming that UTF-8 string is a constrained subtype of
>> Latin-1 string.
>
> But I do not claim this!
>
> The misconception is to think that String is meant to be
> Latin-1 String. String isn't Latin-1 String. Ada states
> a *correspondence*, but no essence at all.

3.5.2

"The predefined type Character is a character type whose values 
correspond to the 256 code positions of Row 00 (also known as Latin-1) 
of the ISO/IEC 10646:2003 Basic Multilingual Plane (BMP)."

String means Latin-1. You can use it as if it meant something else, e.g. 
UTF-8 string or UCS-2 string or PDP-11 machine code. That would prove 
nothing except your willingness to go untyped.

> In fact, reading Japanese, or Polish, or Hebrew text would
> be impossible to do in Ada if String was Latin-1!

Polish alphabet is Latin based, BTW.

Yes, you need to break the type system in order to re-interpret String 
as a UTF-8 string. You cannot do it in a typed way, that is the whole 
point. Latin-1 and UTF-8 strings are not subtypes unless you break 
types. Once you did it does not make any sense to talk about subtypes 
anymore. Subtype presumes keeping if not all (LSP subtype) but some of 
vital properties. Re-interpreted Latin-1 to UTF-8 strings keep almost 
none of string properties.

>>> To get a subset U from a set S, you apply a constraint
>>> to S. That's not (easily) expressible in Ada in this case.
>>
>> There is no such constraint at all. A-umlaut in Latin-1 is one
>> character, in UTF-8 it is two characters.
>
> In Ada, A-Umlaut is not a character in Latin-1,

It is. ISO/IEC 8859-1

> Where we would be needing conversion, were Ada to have
> types for character sets and so on, we now have operations
> such as Encode, Decode, and Convert.

Yep, Ada goes untyped mess. Again, it is not an ill will to make C out 
of Ada, it is merely a deficiency of Ada type system to do it properly. 
We cannot do it with generics or constrained subtypes, so we drop typing 
to have at least something.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

  parent reply	other threads:[~2016-10-19  8:49 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-17 20:18 Bug in Ada - Latin 1 is not a subset of UTF-8 Lucretia
2016-10-17 20:57 ` Jacob Sparre Andersen
2016-10-18  5:44   ` J-P. Rosen
2016-10-17 23:25 ` G.B.
2016-10-18  7:41   ` Dmitry A. Kazakov
2016-10-18  8:23     ` G.B.
2016-10-18  8:45       ` Dmitry A. Kazakov
2016-10-18 10:09         ` G.B.
2016-10-18 12:24           ` Dmitry A. Kazakov
2016-10-18 15:10             ` G.B.
2016-10-18 16:35               ` Dmitry A. Kazakov
2016-10-18 17:35                 ` G.B.
2016-10-18 20:03                   ` Dmitry A. Kazakov
2016-10-19  8:15                     ` G.B.
2016-10-19  8:25                       ` G.B.
2016-10-19  8:49                       ` Dmitry A. Kazakov [this message]
2016-10-19 14:20                         ` G.B.
2016-10-19 16:20                           ` Dmitry A. Kazakov
2016-10-20  0:31         ` Randy Brukardt
2016-10-20  7:36           ` Dmitry A. Kazakov
2016-10-21 12:28             ` G.B.
2016-10-21 16:13               ` Lucretia
2016-10-21 16:43                 ` Dmitry A. Kazakov
2016-10-22  5:51                   ` G.B.
2016-10-22  7:49                     ` Dmitry A. Kazakov
2016-10-24 11:35                       ` Luke A. Guest
2016-10-24 13:01                         ` Dmitry A. Kazakov
2016-10-24 14:54                           ` Luke A. Guest
2016-10-22  1:53             ` Randy Brukardt
2016-10-28 21:08         ` Shark8
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox