* Bug in Ada - Latin 1 is not a subset of UTF-8 @ 2016-10-17 20:18 Lucretia 2016-10-17 20:57 ` Jacob Sparre Andersen 2016-10-17 23:25 ` G.B. 0 siblings, 2 replies; 30+ messages in thread From: Lucretia @ 2016-10-17 20:18 UTC (permalink / raw) Hi, Whilst binding SDL_TTF function, I was going to Overload the TTF_Size* functions, but I couldn't do that because UTF_8_String is a subtype of String; String is Latin 1 and Latin 1 is not a subset of UTF-8, ASCII is. UTF_String should be implemented as an array like String and then UTF_8_String should be a subtype of UTF_String or a renaming, if that is the intent. Luke. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-17 20:18 Bug in Ada - Latin 1 is not a subset of UTF-8 Lucretia @ 2016-10-17 20:57 ` Jacob Sparre Andersen 2016-10-18 5:44 ` J-P. Rosen 2016-10-17 23:25 ` G.B. 1 sibling, 1 reply; 30+ messages in thread From: Jacob Sparre Andersen @ 2016-10-17 20:57 UTC (permalink / raw) Lucretia wrote: > Whilst binding SDL_TTF function, I was going to Overload the TTF_Size* > functions, but I couldn't do that because UTF_8_String is a subtype of > String; String is Latin 1 and Latin 1 is not a subset of UTF-8, ASCII > is. > > UTF_String should be implemented as an array like String and then > UTF_8_String should be a subtype of UTF_String or a renaming, if that > is the intent. I think the best you can do is to ignore the subtypes declared in Ada.Strings.UTF_Encoding (as they are just plain wrong), and declare your own type for storing UTF-8 encoded strings. Greetings, Jacob -- "There are only two types of data: Data which has been backed up Data which has not been lost - yet" ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-17 20:57 ` Jacob Sparre Andersen @ 2016-10-18 5:44 ` J-P. Rosen 0 siblings, 0 replies; 30+ messages in thread From: J-P. Rosen @ 2016-10-18 5:44 UTC (permalink / raw) Le 17/10/2016 à 22:57, Jacob Sparre Andersen a écrit : >> UTF_String should be implemented as an array like String and then >> > UTF_8_String should be a subtype of UTF_String or a renaming, if that >> > is the intent. > I think the best you can do is to ignore the subtypes declared in > Ada.Strings.UTF_Encoding (as they are just plain wrong), and declare > your own type for storing UTF-8 encoded strings. FWIW, the issue of whether to make UTF-8 a different type or a subtype of String was discussed at the ARG. It was decided to make a subtype basically on the grounds that: 1) In most cases, you need to read the beginning of a file (presumably with Text_IO) before you decide whether it is UTF-8 or not 2) We feared that with a separate type, people would complain that "once again, Ada does it differently than other languages", and that it would involve many type conversions for no real benefit. -- J-P. Rosen Adalog 2 rue du Docteur Lombard, 92441 Issy-les-Moulineaux CEDEX Tel: +33 1 45 29 21 52, Fax: +33 1 45 29 25 00 http://www.adalog.fr ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-17 20:18 Bug in Ada - Latin 1 is not a subset of UTF-8 Lucretia 2016-10-17 20:57 ` Jacob Sparre Andersen @ 2016-10-17 23:25 ` G.B. 2016-10-18 7:41 ` Dmitry A. Kazakov 1 sibling, 1 reply; 30+ messages in thread From: G.B. @ 2016-10-17 23:25 UTC (permalink / raw) On 17.10.16 22:18, Lucretia wrote: > Hi, > > Whilst binding SDL_TTF function, I was going to Overload the TTF_Size* functions, but I couldn't do that because UTF_8_String is a subtype of String; String is Latin 1 and Latin 1 is not a subset of UTF-8, ASCII is. > > UTF_String should be implemented as an array like String and then UTF_8_String should be a subtype of UTF_String or a renaming, if that is the intent. > According to ISO 10646, UTF stands for UCS Transformation Format. So, it's a format, suggesting a representation. On similar grounds, one could define a string subtype for other types of objects, for example subtype Number_String is String; The components represent the bits of the octets of the numbers (base 256) in sequence, of whole numbers assumed to be phone numbers. Each phone number is headed by a plus sign. So, calling a taxi by telephone in Berlin, Dublin, or Ho Chi Minh City might be helped by turning the string "+^@^A%??+^@^H9^K-+^@^S??^C" into the respective numbers. The intent, I guess, of UTF_String and its kin is to facilitate reading and writing items of UCS. -- "HOTDOGS ARE NOT BOOKMARKS" Springfield Elementary teaching staff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-17 23:25 ` G.B. @ 2016-10-18 7:41 ` Dmitry A. Kazakov 2016-10-18 8:23 ` G.B. 0 siblings, 1 reply; 30+ messages in thread From: Dmitry A. Kazakov @ 2016-10-18 7:41 UTC (permalink / raw) On 18/10/2016 01:25, G.B. wrote: > On 17.10.16 22:18, Lucretia wrote: > According to ISO 10646, UTF stands for UCS Transformation > Format. So, it's a format, suggesting a representation. > > On similar grounds, one could define a string subtype for > other types of objects, for example > > subtype Number_String is String; You are wrong. String of numeric characters is not an encoding, it is a constraint = (def) each instance of numeric string is a string. [An example of encoding (= representation) is IEEE 754 vs IBM 360 float.] UTF-8 string is not a constrained string and conversely string is not a constrained UTF-8 string. These are two distinct types which values (some of them) overlap and can be converted into each other. The latter allows making them subtypes, but Ada language lacks means for that. In Ada a subtype can either be a constraint (AKA "Ada subtype") or class member / class-wide. UTF-8 is not a constraint and String is not tagged. The decision to force UTF-8 string and string [Latin-1 string to be precise] to be subtypes in Ada sense is the least of two evils. It is bad and wrong, but the alternative would be only worse. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-18 7:41 ` Dmitry A. Kazakov @ 2016-10-18 8:23 ` G.B. 2016-10-18 8:45 ` Dmitry A. Kazakov 0 siblings, 1 reply; 30+ messages in thread From: G.B. @ 2016-10-18 8:23 UTC (permalink / raw) On 18.10.16 09:41, Dmitry A. Kazakov wrote: > On 18/10/2016 01:25, G.B. wrote: >> On 17.10.16 22:18, Lucretia wrote: > >> According to ISO 10646, UTF stands for UCS Transformation >> Format. So, it's a format, suggesting a representation. >> >> On similar grounds, one could define a string subtype for >> other types of objects, for example >> >> subtype Number_String is String; > > You are wrong. The constraints on either UTF_String or or Number_String are not expressible as simple Ada subtypes. They are given by description and normative reference, respectively. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-18 8:23 ` G.B. @ 2016-10-18 8:45 ` Dmitry A. Kazakov 2016-10-18 10:09 ` G.B. ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: Dmitry A. Kazakov @ 2016-10-18 8:45 UTC (permalink / raw) On 18/10/2016 10:23, G.B. wrote: > On 18.10.16 09:41, Dmitry A. Kazakov wrote: >> On 18/10/2016 01:25, G.B. wrote: >>> On 17.10.16 22:18, Lucretia wrote: >> >>> According to ISO 10646, UTF stands for UCS Transformation >>> Format. So, it's a format, suggesting a representation. >>> >>> On similar grounds, one could define a string subtype for >>> other types of objects, for example >>> >>> subtype Number_String is String; >> >> You are wrong. > > The constraints on either UTF_String or or Number_String are > not expressible as simple Ada subtypes. They are given by > description and normative reference, respectively. In the case of UTF-8 it is not a constraint. "Ä" has different representations as Latin-1 and UTF-8 strings. Numeric character is a constraint expressible in Ada: subtype Numeric is Character range '0'..'9'; Numeric string constraint is not expressible, but it still a constraint. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-18 8:45 ` Dmitry A. Kazakov @ 2016-10-18 10:09 ` G.B. 2016-10-18 12:24 ` Dmitry A. Kazakov 2016-10-20 0:31 ` Randy Brukardt 2016-10-28 21:08 ` Shark8 2 siblings, 1 reply; 30+ messages in thread From: G.B. @ 2016-10-18 10:09 UTC (permalink / raw) On 18.10.16 10:45, Dmitry A. Kazakov wrote: > On 18/10/2016 10:23, G.B. wrote: >> On 18.10.16 09:41, Dmitry A. Kazakov wrote: >>> On 18/10/2016 01:25, G.B. wrote: >>>> On 17.10.16 22:18, Lucretia wrote: >>> >>>> According to ISO 10646, UTF stands for UCS Transformation >>>> Format. So, it's a format, suggesting a representation. >>>> >>>> On similar grounds, one could define a string subtype for >>>> other types of objects, for example >>>> >>>> subtype Number_String is String; >>> >>> You are wrong. >> >> The constraints on either UTF_String or or Number_String are >> not expressible as simple Ada subtypes. They are given by >> description and normative reference, respectively. > > In the case of UTF-8 it is not a constraint. Not an Ada constraint, in particular insofar as UTF-8 means a representation; still, any UTF-8 encoded "string" of UCS objects is wellformed and it satisfies a predicate that involves all components x, x', x'', ... of a UTF_8_String object, by stating that if x matches 2#10......#, then x' is such-and-such, and so on. I'm not sure this predicate is easily stated as a stand-alone type invariant, for example, but that's the idea. It shouldn't have to be visible to Ada programmers. > > Numeric character is a constraint expressible in Ada: > > subtype Numeric is Character range '0'..'9'; > > Numeric string constraint is not expressible, but it still a constraint. (Although, the Numeric_String subtype described earlier will have a meaningless constraint on Numeric, since all remainders are values both in base 256 and in Character. Come to think of it, the example format is broken. #-) -- "HOTDOGS ARE NOT BOOKMARKS" Springfield Elementary teaching staff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-18 10:09 ` G.B. @ 2016-10-18 12:24 ` Dmitry A. Kazakov 2016-10-18 15:10 ` G.B. 0 siblings, 1 reply; 30+ messages in thread From: Dmitry A. Kazakov @ 2016-10-18 12:24 UTC (permalink / raw) On 18/10/2016 12:09, G.B. wrote: > On 18.10.16 10:45, Dmitry A. Kazakov wrote: >> On 18/10/2016 10:23, G.B. wrote: >>> On 18.10.16 09:41, Dmitry A. Kazakov wrote: >>>> On 18/10/2016 01:25, G.B. wrote: >>>>> On 17.10.16 22:18, Lucretia wrote: >>>> >>>>> According to ISO 10646, UTF stands for UCS Transformation >>>>> Format. So, it's a format, suggesting a representation. >>>>> >>>>> On similar grounds, one could define a string subtype for >>>>> other types of objects, for example >>>>> >>>>> subtype Number_String is String; >>>> >>>> You are wrong. >>> >>> The constraints on either UTF_String or or Number_String are >>> not expressible as simple Ada subtypes. They are given by >>> description and normative reference, respectively. >> >> In the case of UTF-8 it is not a constraint. > > Not an Ada constraint, in particular insofar as UTF-8 means > a representation; > still, any UTF-8 encoded "string" of UCS objects is wellformed > and it satisfies a predicate that involves all components x, x', x'', ... > of a UTF_8_String object, by stating that if x matches 2#10......#, > then x' is such-and-such, and so on. I'm not sure this predicate > is easily stated as a stand-alone type invariant, for example, but > that's the idea. It shouldn't have to be visible to Ada programmers. Sorry, that is a meaningless set of words. Type constraint is put on type values. Values of UTF-8 strings are not values of strings, as A-umlaut promptly demonstrates. Period. >> Numeric character is a constraint expressible in Ada: >> >> subtype Numeric is Character range '0'..'9'; >> >> Numeric string constraint is not expressible, but it still a constraint. > > (Although, the Numeric_String subtype described earlier will have > a meaningless constraint on Numeric, since all remainders > are values both in base 256 and in Character. Come to think of it, > the example format is broken. #-) "Remainders are values ... in Character" makes no sense either. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-18 12:24 ` Dmitry A. Kazakov @ 2016-10-18 15:10 ` G.B. 2016-10-18 16:35 ` Dmitry A. Kazakov 0 siblings, 1 reply; 30+ messages in thread From: G.B. @ 2016-10-18 15:10 UTC (permalink / raw) On 18.10.16 14:24, Dmitry A. Kazakov wrote: >> still, any UTF-8 encoded "string" of UCS objects is wellformed >> and it satisfies a predicate that involves all components x, x', x'', ... >> of a UTF_8_String object, by stating that if x matches 2#10......#, >> then x' is such-and-such, and so on. I'm not sure this predicate >> is easily stated as a stand-alone type invariant, for example, but >> that's the idea. It shouldn't have to be visible to Ada programmers. > > Sorry, that is a meaningless set of words. Spelling out the look of model strings for type UTF_8_String cannot quite be meaningless. Ada Rationale: "Type invariants are designed for use with private types where we want some relationship to always hold between components of the type". (2.4) We want some relationship to always hold between components of type UTF_8_String (if it were private, so that 2.4 might formally apply): type UTF_Rep_Text is ... with Type_Invariant => (... (case UTF_Rep_Text (K) is when 2#10_000000# .. 2#10_111111# => (case UTF_Rep_Text (K + 1) is when 2#1_0000000# .. 2#1_1111111# => ...)) ...); > Type constraint is put on type values. (Type values or a type's values?) AI-05-0146: "invariants apply to all values of a type, while constraints are generally used to identify a subset of the values of a type". UTF_8_String does identify a subset of the values of type String, by intent, even if it takes more of the RM to see that: Since Strings are dumb insofar as they allow every value of type Character as a component, a string that is a well-formed UTF-8 sequence U of octets---each octet appears as a Characters---is in a subset of type String's. All are of finite length. These well-formed sequences U establish a subset of all possible String values. Call it UTF_8_String, not Unicode_String, nor UCS_String. As said, I don't think that the set's predicate is easy to state. With an aspect stating it, a purpoted UTF_8_String value that isn't will be dropped from the set, perhaps as loudly as raising Encoding_Error will be now. > Values of UTF-8 strings are not values of strings, as A-umlaut promptly demonstrates. Period. Of course they can be (ASCII subset). Also, the UTF_String thing is just a vague expression of what RM A.4.11(47/3) states mores specifically, for going from this representation oriented subtype to "real" characters from the UCS. But this is not otherwise reflected in the subtype, AFAICS. Considering UTF_String'("'Ä' is A-Umlaut"); the literal, if taken at face value. doesn't say which characters there are going to be. It takes an Ada compiler to interpret the source text and decide whether it is representing Latin-1 or a multi-octed sequence, possibly one that needs Wide_Character or Wide_Wide_Character. > "Remainders are values ... in Character" makes no sense either. Character'Val (N rem 256); -- "HOTDOGS ARE NOT BOOKMARKS" Springfield Elementary teaching staff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-18 15:10 ` G.B. @ 2016-10-18 16:35 ` Dmitry A. Kazakov 2016-10-18 17:35 ` G.B. 0 siblings, 1 reply; 30+ messages in thread From: Dmitry A. Kazakov @ 2016-10-18 16:35 UTC (permalink / raw) On 2016-10-18 17:10, G.B. wrote: > On 18.10.16 14:24, Dmitry A. Kazakov wrote: > >>> still, any UTF-8 encoded "string" of UCS objects is wellformed >>> and it satisfies a predicate that involves all components x, x', x'', >>> ... >>> of a UTF_8_String object, by stating that if x matches 2#10......#, >>> then x' is such-and-such, and so on. I'm not sure this predicate >>> is easily stated as a stand-alone type invariant, for example, but >>> that's the idea. It shouldn't have to be visible to Ada programmers. >> >> Sorry, that is a meaningless set of words. > > Spelling out the look of model strings for type UTF_8_String > cannot quite be meaningless. > > Ada Rationale: > > "Type invariants are designed for use with private types > where we want some relationship to always hold between > components of the type". (2.4) That is completely irrelevant. No invariant can make Latin-1 A-umlaut UTF-8 A-umlaut. >> Type constraint is put on type values. > > (Type values or a type's values?) Values of a type, E.g. Positive is constrained Integer. > UTF_8_String does identify a subset of the values of > type String, by intent, No, it does not, that is why this implementation is broken. UTF-8 strings can be represented by String, they can be represented by Boolean arrays or by indefinite integers or by polygons. That does not make a them Boolean array subtype. No way. >> Values of UTF-8 strings are not values of strings, as A-umlaut >> promptly demonstrates. Period. > > Of course they can be (ASCII subset). A-umlaut is not ASCII. > But this is not otherwise reflected in the subtype, AFAICS. > Considering > > UTF_String'("'Ä' is A-Umlaut"); > > the literal, if taken at face value. doesn't say which characters > there are going to be. It does exactly this, once you define "character". > It takes an Ada compiler to interpret the > source text and decide whether it is representing Latin-1 or > a multi-octed sequence, possibly one that needs Wide_Character > or Wide_Wide_Character. There is nothing to interpret considering literals of Universal_String. It is no different from the way Universal_Integer is handled. String and UTF-8 string and Wide string can be considered subtypes of Universal_String, that does not have effect on the relationships between String and UTF-8 string. Same if literals considered overloaded functions. No different. >> "Remainders are values ... in Character" makes no sense either. > > Character'Val (N rem 256); So what? Numeric characters is still a constrained subtype of character type. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-18 16:35 ` Dmitry A. Kazakov @ 2016-10-18 17:35 ` G.B. 2016-10-18 20:03 ` Dmitry A. Kazakov 0 siblings, 1 reply; 30+ messages in thread From: G.B. @ 2016-10-18 17:35 UTC (permalink / raw) On 18.10.16 18:35, Dmitry A. Kazakov wrote: > No invariant can make Latin-1 A-umlaut UTF-8 A-umlaut. Who would ever want to do that? Before I/O, there is nothing. UTF_8_String is for encoding and decoding subprograms of Ada. For them to be successful, a predicate could be used to express the set of values that can be parsed. It so happens that its members are officially said to be in encoded form. To get a subset U from a set S, you apply a constraint to S. That's not (easily) expressible in Ada in this case. But if it is, with the help of a predicate, the we can say that UTF_8_String is-a "constrained" String because their sets are. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-18 17:35 ` G.B. @ 2016-10-18 20:03 ` Dmitry A. Kazakov 2016-10-19 8:15 ` G.B. 0 siblings, 1 reply; 30+ messages in thread From: Dmitry A. Kazakov @ 2016-10-18 20:03 UTC (permalink / raw) On 2016-10-18 19:35, G.B. wrote: > On 18.10.16 18:35, Dmitry A. Kazakov wrote: >> No invariant can make Latin-1 A-umlaut UTF-8 A-umlaut. > > Who would ever want to do that? Somebody claiming that UTF-8 string is a constrained subtype of Latin-1 string. > To get a subset U from a set S, you apply a constraint > to S. That's not (easily) expressible in Ada in this case. There is no such constraint at all. A-umlaut in Latin-1 is one character, in UTF-8 it is two characters. To introduce a subtype relationship we need a conversion, not a constraint. Ada does not support this method of subtype construction. > But if it is, with the help of a predicate, the we can > say that UTF_8_String is-a "constrained" String because > their sets are. They are not, as demonstrated on the example of A-umlaut. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-18 20:03 ` Dmitry A. Kazakov @ 2016-10-19 8:15 ` G.B. 2016-10-19 8:25 ` G.B. 2016-10-19 8:49 ` Dmitry A. Kazakov 0 siblings, 2 replies; 30+ messages in thread From: G.B. @ 2016-10-19 8:15 UTC (permalink / raw) On 18.10.16 22:03, Dmitry A. Kazakov wrote: > On 2016-10-18 19:35, G.B. wrote: >> On 18.10.16 18:35, Dmitry A. Kazakov wrote: >>> No invariant can make Latin-1 A-umlaut UTF-8 A-umlaut. >> >> Who would ever want to do that? > > Somebody claiming that UTF-8 string is a constrained subtype of Latin-1 string. But I do not claim this! The misconception is to think that String is meant to be Latin-1 String. String isn't Latin-1 String. Ada states a *correspondence*, but no essence at all. In fact, reading Japanese, or Polish, or Hebrew text would be impossible to do in Ada if String was Latin-1! Yes, character sets in Ada do not have types. >> To get a subset U from a set S, you apply a constraint >> to S. That's not (easily) expressible in Ada in this case. > > There is no such constraint at all. A-umlaut in Latin-1 is one character, in UTF-8 it is two characters. In Ada, A-Umlaut is not a character in Latin-1, In Ada, A-Umlaut is not a character in UTF-8. Reason: Latin-1 and UTF-8 describe encoded forms, as do KOI8-R, ISO-8859-15, Shift_JIS, or CP 1252. Some only happen to list, and some only indicate a repertoire of corresponding characters also. A-Umlaut is a character, lower case C. > To introduce a subtype relationship we need a conversion, not a constraint. Ada does not support this method of subtype construction. An Ada-subtype relationship is designed to avoid conversion, And so it is distinguishable by its constraint, and its name, only. Where we would be needing conversion, were Ada to have types for character sets and so on, we now have operations such as Encode, Decode, and Convert. Together with statements of correspondence and normative reference in the RM. But both do not prevent identifying a subset of valid values of dumb type String that constitute the subset of UTF_8_String. Or that of a to-be-defined (trivial) subtype Latin_1_String. type Latin_String is String; -- RM blah blah ... type Latin_1_String is String; -- "HOTDOGS ARE NOT BOOKMARKS" Springfield Elementary teaching staff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-19 8:15 ` G.B. @ 2016-10-19 8:25 ` G.B. 2016-10-19 8:49 ` Dmitry A. Kazakov 1 sibling, 0 replies; 30+ messages in thread From: G.B. @ 2016-10-19 8:25 UTC (permalink / raw) On 19.10.16 10:15, G.B. wrote: > type Latin_String is String; > -- RM blah blah ... > > type Latin_1_String is String; subtype ... Sorry. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-19 8:15 ` G.B. 2016-10-19 8:25 ` G.B. @ 2016-10-19 8:49 ` Dmitry A. Kazakov 2016-10-19 14:20 ` G.B. 1 sibling, 1 reply; 30+ messages in thread From: Dmitry A. Kazakov @ 2016-10-19 8:49 UTC (permalink / raw) On 19/10/2016 10:15, G.B. wrote: > On 18.10.16 22:03, Dmitry A. Kazakov wrote: >> On 2016-10-18 19:35, G.B. wrote: >>> On 18.10.16 18:35, Dmitry A. Kazakov wrote: >>>> No invariant can make Latin-1 A-umlaut UTF-8 A-umlaut. >>> >>> Who would ever want to do that? >> >> Somebody claiming that UTF-8 string is a constrained subtype of >> Latin-1 string. > > But I do not claim this! > > The misconception is to think that String is meant to be > Latin-1 String. String isn't Latin-1 String. Ada states > a *correspondence*, but no essence at all. 3.5.2 "The predefined type Character is a character type whose values correspond to the 256 code positions of Row 00 (also known as Latin-1) of the ISO/IEC 10646:2003 Basic Multilingual Plane (BMP)." String means Latin-1. You can use it as if it meant something else, e.g. UTF-8 string or UCS-2 string or PDP-11 machine code. That would prove nothing except your willingness to go untyped. > In fact, reading Japanese, or Polish, or Hebrew text would > be impossible to do in Ada if String was Latin-1! Polish alphabet is Latin based, BTW. Yes, you need to break the type system in order to re-interpret String as a UTF-8 string. You cannot do it in a typed way, that is the whole point. Latin-1 and UTF-8 strings are not subtypes unless you break types. Once you did it does not make any sense to talk about subtypes anymore. Subtype presumes keeping if not all (LSP subtype) but some of vital properties. Re-interpreted Latin-1 to UTF-8 strings keep almost none of string properties. >>> To get a subset U from a set S, you apply a constraint >>> to S. That's not (easily) expressible in Ada in this case. >> >> There is no such constraint at all. A-umlaut in Latin-1 is one >> character, in UTF-8 it is two characters. > > In Ada, A-Umlaut is not a character in Latin-1, It is. ISO/IEC 8859-1 > Where we would be needing conversion, were Ada to have > types for character sets and so on, we now have operations > such as Encode, Decode, and Convert. Yep, Ada goes untyped mess. Again, it is not an ill will to make C out of Ada, it is merely a deficiency of Ada type system to do it properly. We cannot do it with generics or constrained subtypes, so we drop typing to have at least something. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-19 8:49 ` Dmitry A. Kazakov @ 2016-10-19 14:20 ` G.B. 2016-10-19 16:20 ` Dmitry A. Kazakov 0 siblings, 1 reply; 30+ messages in thread From: G.B. @ 2016-10-19 14:20 UTC (permalink / raw) On 19.10.16 10:49, Dmitry A. Kazakov wrote: >> The misconception is to think that String is meant to be >> Latin-1 String. String isn't Latin-1 String. Ada states >> a *correspondence*, but no essence at all. > > 3.5.2 > > "The predefined type Character is a character type whose values > correspond to the 256 code positions of Row 00 (also known as Latin-1) ^^^^^^^^^^ > of the ISO/IEC 10646:2003 Basic Multilingual Plane (BMP)." Exactly, it means, values aren't Latin-1, they correspond to Latin-1 code points. (To be /= To correspond to.) >>>> To get a subset U from a set S, you apply a constraint >>>> to S. That's not (easily) expressible in Ada in this case. >>> >>> There is no such constraint at all. A-umlaut in Latin-1 is one >>> character, in UTF-8 it is two characters. A-Umlaut is a character, not a character-in-Some-Encoding-Form. '€' is one, too, as are the four in "Łódź" that the man named "Artiñano" (8 characters) could not manage to type into his letter without accidentally spoiling his last name. >> In Ada, A-Umlaut is not a character in Latin-1, > > It is. ISO/IEC 8859-1 For Ada, A-Umlaut is ("essence" vs "correspondence") not a character in ISO/IEC 8859-1, but there exist correspondences between A-Umlaut and the Ada Character and ISO/IEC 8859-1. And we "cannot do it in a typed way, that is the whole point". > it is merely a deficiency of Ada type system to do it properly. > We cannot do it with generics or constrained subtypes, so we drop typing > to have at least something. Ada can add a constraining aspect to a type derived from String so as to formally specify the set of values in that type. In a way similar to type US_Elevator is new Integer range -10 .. 500 with Static_Predicate => US_Elevator /= 13; The short, informal name of that computable, exact specification by a Predicate for the former type derived from String is "UTF-8". It gives one-way substitutability: you can use a value of the derived type wherever you can use a value of type String, if there ever is a need for doing so (e.g. dumb String'Write can be reused after Convert-ing to UTF_8_String (encoding)). ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-19 14:20 ` G.B. @ 2016-10-19 16:20 ` Dmitry A. Kazakov 0 siblings, 0 replies; 30+ messages in thread From: Dmitry A. Kazakov @ 2016-10-19 16:20 UTC (permalink / raw) On 2016-10-19 16:20, G.B. wrote: > On 19.10.16 10:49, Dmitry A. Kazakov wrote: > >>> The misconception is to think that String is meant to be >>> Latin-1 String. String isn't Latin-1 String. Ada states >>> a *correspondence*, but no essence at all. >> >> 3.5.2 >> >> "The predefined type Character is a character type whose values >> correspond to the 256 code positions of Row 00 (also known as Latin-1) > ^^^^^^^^^^ >> of the ISO/IEC 10646:2003 Basic Multilingual Plane (BMP)." > > Exactly, it means, values aren't Latin-1, they correspond > to Latin-1 code points. (To be /= To correspond to.) They are. The language is necessarily sloppy for the sake of simplicity. Values are Latin-1. The corresponding language character objects (which are customary called "values" too) correspond, represent these values. There is no reason to distinguish language values and problem space values they represent so long there is no confusion. Anyway it does not change anything in the discussion. Same objects of String and UTF-8 Strings correspond/represent different problem space values. Sameness is defined as equality "=". >>>>> To get a subset U from a set S, you apply a constraint >>>>> to S. That's not (easily) expressible in Ada in this case. >>>> >>>> There is no such constraint at all. A-umlaut in Latin-1 is one >>>> character, in UTF-8 it is two characters. > > A-Umlaut is a character, not a character-in-Some-Encoding-Form. The text you quote states exactly that. >>> In Ada, A-Umlaut is not a character in Latin-1, >> >> It is. ISO/IEC 8859-1 > > For Ada, A-Umlaut is ("essence" vs "correspondence") not > a character in ISO/IEC 8859-1, but there exist correspondences > between A-Umlaut and the Ada Character and ISO/IEC 8859-1. Ada character objects represent characters defined in ISO/IEC 8859-1. For each object there is one and only one ISO/IEC 8859-1 character and conversely for each ISO/IEC 8859-1 character there one and only one Ada character value. > And we "cannot do it in a typed way, that is the whole point". > >> it is merely a deficiency of Ada type system to do it properly. >> We cannot do it with generics or constrained subtypes, so we drop typing >> to have at least something. > > Ada can add a constraining aspect to a type derived from String > so as to formally specify the set of values in that type. That won't be a string subtype, a property considered more important than being a proper subtype. There is no language subtype that could represent a subtype relationship between sequences of *same* characters having *different* encoding (representation). Which is the essence of the problem. > The short, informal name of that computable, exact specification > by a Predicate for the former type derived from String is "UTF-8". > > It gives one-way substitutability: you can use a value of the > derived type wherever you can use a value of type String, if > there ever is a need for doing so (e.g. dumb String'Write can be > reused after Convert-ing to UTF_8_String (encoding)). See the example with A-umlaut illustrates. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-18 8:45 ` Dmitry A. Kazakov 2016-10-18 10:09 ` G.B. @ 2016-10-20 0:31 ` Randy Brukardt 2016-10-20 7:36 ` Dmitry A. Kazakov 2016-10-28 21:08 ` Shark8 2 siblings, 1 reply; 30+ messages in thread From: Randy Brukardt @ 2016-10-20 0:31 UTC (permalink / raw) "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message news:nu4nee$18le$1@gioia.aioe.org... ... > Numeric character is a constraint expressible in Ada: > > subtype Numeric is Character range '0'..'9'; > > Numeric string constraint is not expressible, but it still a constraint. It's expressible as a predicate, though; that's the entire point of predicates (to act like user-defined constraints): subtype Numeric_String is String with Dynamic_Predicate => (for all E of Numeric_String => E in Numeric); It's not 100% as good as a constraint (as modifications of individual components won't be checked), but it almost always will do the job. You also could declare a new type with the proper constraint: type Numeric_String is array (Positive range <>) of Numeric; That will have all of the string operations, but it (unfortunately) can't be converted to String (you'd have to write a function to do that). Since both of these possibilities exist, I'd hardly call the constraint "not expressible". At worst, it's inconvinient to express it. Randy. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-20 0:31 ` Randy Brukardt @ 2016-10-20 7:36 ` Dmitry A. Kazakov 2016-10-21 12:28 ` G.B. 2016-10-22 1:53 ` Randy Brukardt 0 siblings, 2 replies; 30+ messages in thread From: Dmitry A. Kazakov @ 2016-10-20 7:36 UTC (permalink / raw) On 20/10/2016 02:31, Randy Brukardt wrote: > "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message > news:nu4nee$18le$1@gioia.aioe.org... > ... >> Numeric character is a constraint expressible in Ada: >> >> subtype Numeric is Character range '0'..'9'; >> >> Numeric string constraint is not expressible, but it still a constraint. > > It's expressible as a predicate, though; that's the entire point of > predicates (to act like user-defined constraints): > > subtype Numeric_String is String > with Dynamic_Predicate => (for all E of Numeric_String => E in > Numeric); > > It's not 100% as good as a constraint (as modifications of individual > components won't be checked), but it almost always will do the job. Not nice. Is there a reason why, apart from premature optimization? > You also could declare a new type with the proper constraint: > type Numeric_String is array (Positive range <>) of Numeric; > > That will have all of the string operations, but it (unfortunately) can't be > converted to String (you'd have to write a function to do that). > > Since both of these possibilities exist, I'd hardly call the constraint "not > expressible". At worst, it's inconvinient to express it. Yes, maybe. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-20 7:36 ` Dmitry A. Kazakov @ 2016-10-21 12:28 ` G.B. 2016-10-21 16:13 ` Lucretia 2016-10-22 1:53 ` Randy Brukardt 1 sibling, 1 reply; 30+ messages in thread From: G.B. @ 2016-10-21 12:28 UTC (permalink / raw) On 20.10.16 09:36, Dmitry A. Kazakov wrote: > On 20/10/2016 02:31, Randy Brukardt wrote: >> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message >> news:nu4nee$18le$1@gioia.aioe.org... >> ... >>> Numeric character is a constraint expressible in Ada: >>> >>> subtype Numeric is Character range '0'..'9'; >>> >>> Numeric string constraint is not expressible, but it still a constraint. >> >> It's expressible as a predicate, though; that's the entire point of >> predicates (to act like user-defined constraints): >> >> subtype Numeric_String is String >> with Dynamic_Predicate => (for all E of Numeric_String => E in >> Numeric); >> >> It's not 100% as good as a constraint (as modifications of individual >> components won't be checked), but it almost always will do the job. > > Not nice. Is there a reason why, apart from premature optimization? I think you can add an aspect to the component type and have that checked on assignment to a component. The aspect could somehow be different from the constraint, also just repeating it appears to loop infinitely with current GNATs. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78066 Anyway, a little inconvenience for starters: subtype My_Utf_8_String is String -- or, when not String, some array of any component type -- suitable as a byte sequence item type with Dynamic_Predicate => Is_Well_Formed (My_Utf_8_String); Bom: constant String := String'(Character'Val (16#EF#), Character'Val (16#BB#), Character'Val (16#BF#)); function Has_Bom (U8: String) return Boolean is (U8'Length >= 3 and then U8 (U8'First .. U8'First + 2) = Bom); function "abs" is new Ada.Unchecked_Conversion (Character, Interfaces.Unsigned_8); function Is_Well_Formed (U8 : String) return Boolean is -- `U8` has permissible bit patterns for all bytes. (No Table 3.7 -- support.) ((if U8'Length > 0 then (if Has_Bom (U8) then Is_Well_Formed (U8 (U8'First + 3 .. U8'Last)) else (for all J in U8'Range => (case abs U8 (J) is when 2#0_0000000# .. 2#0_1111111# => -- ASCII compatibility True, when 2#10_000000# .. 2#10_111111# => -- is a following byte (if J > U8'First then (abs U8 (J - 1) in 2#110_00000# .. 2#110_11111# or abs U8 (J - 1) in 2#1110_0000# .. 2#1110_1111# or abs U8 (J - 1) in 2#11110_000# .. 2#11110_111#) else False ), when 2#110_00000# .. 2#110_11111# => (if J < U8'Last then (abs U8 (J + 1) in 2#10_000000# .. 2#10_111111#) else False), when 2#1110_0000# .. 2#1110_1111# => (if J + 1 < U8'Last then (for all K in J + 1 .. J + 2 => abs U8 (K) in 2#10_000000# .. 2#10_111111#) else False ), when 2#11110_000# .. 2#11110_111# => (if J + 2 < U8'Last then (for all K in J + 1 .. J + 3 => abs U8 (K) in 2#10_000000# .. 2#10_111111#) else False ), when 2#11111_000# .. 2#11111_111# => -- not in Table 3.6 (UTF-8 Bit Distribution) False ) ) ) -- String of length 0: else True)); Test_Bom : constant My_Utf_8_String := Bom & "ABC"; Test_US : constant My_Utf_8_String := "ABC"; Test_GR : constant My_Utf_8_String := "ΑΒΓ"; Test_RU : constant My_Utf_8_String := "АБГ"; Test_Xx : constant My_Utf_8_String := ('A', Character'Val (16#E4#), 'E'); ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-21 12:28 ` G.B. @ 2016-10-21 16:13 ` Lucretia 2016-10-21 16:43 ` Dmitry A. Kazakov 0 siblings, 1 reply; 30+ messages in thread From: Lucretia @ 2016-10-21 16:13 UTC (permalink / raw) On Friday, 21 October 2016 13:28:52 UTC+1, G.B. wrote: > Test_Bom : constant My_Utf_8_String := Bom & "ABC"; > Test_US : constant My_Utf_8_String := "ABC"; > Test_GR : constant My_Utf_8_String := "ΑΒΓ"; > Test_RU : constant My_Utf_8_String := "АБГ"; > Test_Xx : constant My_Utf_8_String := > ('A', Character'Val (16#E4#), 'E'); Also, the most inefficient string ever: Appended : My_UTF_8_String := "App"; Appended := Some_Other_String & 'e'; -- Call's Is_Well_Formed for each assignment! Sloooooooooooooow Appended := Some_Other_String & 'n'; Appended := Some_Other_String & 'd'; ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-21 16:13 ` Lucretia @ 2016-10-21 16:43 ` Dmitry A. Kazakov 2016-10-22 5:51 ` G.B. 0 siblings, 1 reply; 30+ messages in thread From: Dmitry A. Kazakov @ 2016-10-21 16:43 UTC (permalink / raw) On 2016-10-21 18:13, Lucretia wrote: > On Friday, 21 October 2016 13:28:52 UTC+1, G.B. wrote: > >> Test_Bom : constant My_Utf_8_String := Bom & "ABC"; >> Test_US : constant My_Utf_8_String := "ABC"; >> Test_GR : constant My_Utf_8_String := "ΑΒΓ"; >> Test_RU : constant My_Utf_8_String := "АБГ"; >> Test_Xx : constant My_Utf_8_String := >> ('A', Character'Val (16#E4#), 'E'); > > Also, the most inefficient string ever: > > Appended : My_UTF_8_String := "App"; > > Appended := Some_Other_String & 'e'; -- Call's Is_Well_Formed for each assignment! Sloooooooooooooow > Appended := Some_Other_String & 'n'; > Appended := Some_Other_String & 'd'; For an UTF-8 string proper no checks would be ever required when a character is appanded. The above is a sorry mess of representation colliding with the semantics, octets with characters. 'e' is a Latin-1 character appended as an octet while Unicode character meant. Wrong design gets always punished this way or another. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-21 16:43 ` Dmitry A. Kazakov @ 2016-10-22 5:51 ` G.B. 2016-10-22 7:49 ` Dmitry A. Kazakov 0 siblings, 1 reply; 30+ messages in thread From: G.B. @ 2016-10-22 5:51 UTC (permalink / raw) On 21.10.16 18:43, Dmitry A. Kazakov wrote: > For an UTF-8 string proper no checks would be ever required when a character is appanded. No Unicode sequence in UTF should ever exist visibly in a program other than either during parsing, or during output. -- "HOTDOGS ARE NOT BOOKMARKS" Springfield Elementary teaching staff ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-22 5:51 ` G.B. @ 2016-10-22 7:49 ` Dmitry A. Kazakov 2016-10-24 11:35 ` Luke A. Guest 0 siblings, 1 reply; 30+ messages in thread From: Dmitry A. Kazakov @ 2016-10-22 7:49 UTC (permalink / raw) On 2016-10-22 07:51, G.B. wrote: > On 21.10.16 18:43, Dmitry A. Kazakov wrote: >> For an UTF-8 string proper no checks would be ever required when a >> character is appanded. > > No Unicode sequence in UTF should ever exist visibly in a > program other than either during parsing, or during output. Right. Any encoded string must implement two distinct interfaces: an array of characters and a sequence of encoding elements (e.g. octets). They somehow fit to each other for Latin-1 and UCS-2 strings, but for majority of encoding methods they are drastically different. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-22 7:49 ` Dmitry A. Kazakov @ 2016-10-24 11:35 ` Luke A. Guest 2016-10-24 13:01 ` Dmitry A. Kazakov 0 siblings, 1 reply; 30+ messages in thread From: Luke A. Guest @ 2016-10-24 11:35 UTC (permalink / raw) Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> wrote: > On 2016-10-22 07:51, G.B. wrote: >> On 21.10.16 18:43, Dmitry A. Kazakov wrote: >>> For an UTF-8 string proper no checks would be ever required when a >>> character is appanded. >> >> No Unicode sequence in UTF should ever exist visibly in a >> program other than either during parsing, or during output. > > Right. > > Any encoded string must implement two distinct interfaces: an array of > characters and a sequence of encoding elements (e.g. octets). They > somehow fit to each other for Latin-1 and UCS-2 strings, but for > majority of encoding methods they are drastically different. > There's no such thing as a character, there are octets for Utf-8 and code points. You should also implement graphème clutter access too. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-24 11:35 ` Luke A. Guest @ 2016-10-24 13:01 ` Dmitry A. Kazakov 2016-10-24 14:54 ` Luke A. Guest 0 siblings, 1 reply; 30+ messages in thread From: Dmitry A. Kazakov @ 2016-10-24 13:01 UTC (permalink / raw) On 24/10/2016 13:35, Luke A. Guest wrote: > Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> wrote: >> On 2016-10-22 07:51, G.B. wrote: >>> On 21.10.16 18:43, Dmitry A. Kazakov wrote: >>>> For an UTF-8 string proper no checks would be ever required when a >>>> character is appanded. >>> >>> No Unicode sequence in UTF should ever exist visibly in a >>> program other than either during parsing, or during output. >> >> Right. >> >> Any encoded string must implement two distinct interfaces: an array of >> characters and a sequence of encoding elements (e.g. octets). They >> somehow fit to each other for Latin-1 and UCS-2 strings, but for >> majority of encoding methods they are drastically different. > > There's no such thing as a character, there are octets for Utf-8 and code > points. Code points or Wide_Wide_Character, it does not really matter for practical applications. > You should also implement graphème clutter access too. Not really needed. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-24 13:01 ` Dmitry A. Kazakov @ 2016-10-24 14:54 ` Luke A. Guest 0 siblings, 0 replies; 30+ messages in thread From: Luke A. Guest @ 2016-10-24 14:54 UTC (permalink / raw) Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> wrote: > Code points or Wide_Wide_Character, it does not really matter for > practical applications. > >> You should also implement graphème clutter access too. > > Not really needed. > It is if you're doing rendering. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-20 7:36 ` Dmitry A. Kazakov 2016-10-21 12:28 ` G.B. @ 2016-10-22 1:53 ` Randy Brukardt 1 sibling, 0 replies; 30+ messages in thread From: Randy Brukardt @ 2016-10-22 1:53 UTC (permalink / raw) "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message news:nu9s5v$18f0$1@gioia.aioe.org... > On 20/10/2016 02:31, Randy Brukardt wrote: ... >> It's not 100% as good as a constraint (as modifications of individual >> components won't be checked), but it almost always will do the job. > > Not nice. Is there a reason why, apart from premature optimization? Sure, checking after every component change would require passing some sort of checker subprogram with every reference parameter (since the actual could be part of some object that needs predicate checking). That sort of overhead would be completely unacceptable, especially as it would rarely be used. As such, we stuck with the model that the checks are made in the same places that whole object constraint checks are made (such as discriminant checks). For private types, there is no difference, but some failures might be detected late for types with visible components (like String). Randy. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Bug in Ada - Latin 1 is not a subset of UTF-8 2016-10-18 8:45 ` Dmitry A. Kazakov 2016-10-18 10:09 ` G.B. 2016-10-20 0:31 ` Randy Brukardt @ 2016-10-28 21:08 ` Shark8 2 siblings, 0 replies; 30+ messages in thread From: Shark8 @ 2016-10-28 21:08 UTC (permalink / raw) On Tuesday, October 18, 2016 at 2:45:05 AM UTC-6, Dmitry A. Kazakov wrote: > On 18/10/2016 10:23, G.B. wrote: > > On 18.10.16 09:41, Dmitry A. Kazakov wrote: > >> On 18/10/2016 01:25, G.B. wrote: > >>> On 17.10.16 22:18, Lucretia wrote: > >> > >>> According to ISO 10646, UTF stands for UCS Transformation > >>> Format. So, it's a format, suggesting a representation. > >>> > >>> On similar grounds, one could define a string subtype for > >>> other types of objects, for example > >>> > >>> subtype Number_String is String; > >> > >> You are wrong. > > > > The constraints on either UTF_String or or Number_String are > > not expressible as simple Ada subtypes. They are given by > > description and normative reference, respectively. > > In the case of UTF-8 it is not a constraint. "Ä" has different > representations as Latin-1 and UTF-8 strings. > > Numeric character is a constraint expressible in Ada: > > subtype Numeric is Character range '0'..'9'; > > Numeric string constraint is not expressible, but it still a constraint. You are wrong; it is expressible: -- Using your Numeric character subtype. Subtype Digits is String with Dynamic_Predicate => (for all Ch of Digits => Ch in Numeric); ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2016-10-28 21:08 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-10-17 20:18 Bug in Ada - Latin 1 is not a subset of UTF-8 Lucretia 2016-10-17 20:57 ` Jacob Sparre Andersen 2016-10-18 5:44 ` J-P. Rosen 2016-10-17 23:25 ` G.B. 2016-10-18 7:41 ` Dmitry A. Kazakov 2016-10-18 8:23 ` G.B. 2016-10-18 8:45 ` Dmitry A. Kazakov 2016-10-18 10:09 ` G.B. 2016-10-18 12:24 ` Dmitry A. Kazakov 2016-10-18 15:10 ` G.B. 2016-10-18 16:35 ` Dmitry A. Kazakov 2016-10-18 17:35 ` G.B. 2016-10-18 20:03 ` Dmitry A. Kazakov 2016-10-19 8:15 ` G.B. 2016-10-19 8:25 ` G.B. 2016-10-19 8:49 ` Dmitry A. Kazakov 2016-10-19 14:20 ` G.B. 2016-10-19 16:20 ` Dmitry A. Kazakov 2016-10-20 0:31 ` Randy Brukardt 2016-10-20 7:36 ` Dmitry A. Kazakov 2016-10-21 12:28 ` G.B. 2016-10-21 16:13 ` Lucretia 2016-10-21 16:43 ` Dmitry A. Kazakov 2016-10-22 5:51 ` G.B. 2016-10-22 7:49 ` Dmitry A. Kazakov 2016-10-24 11:35 ` Luke A. Guest 2016-10-24 13:01 ` Dmitry A. Kazakov 2016-10-24 14:54 ` Luke A. Guest 2016-10-22 1:53 ` Randy Brukardt 2016-10-28 21:08 ` Shark8
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox