From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!feeder.eternal-september.org!aioe.org!.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: Bug in Ada - Latin 1 is not a subset of UTF-8 Date: Tue, 18 Oct 2016 14:24:48 +0200 Organization: Aioe.org NNTP Server Message-ID: References: <86f0d2fe-d498-4bc4-bb9d-e34629c89bb4@googlegroups.com> NNTP-Posting-Host: vZYCW951TbFitc4GdEwQJg.user.gioia.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: abuse@aioe.org User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 X-Notice: Filtered by postfilter v. 0.8.2 Xref: news.eternal-september.org comp.lang.ada:32118 Date: 2016-10-18T14:24:48+02:00 List-Id: On 18/10/2016 12:09, G.B. wrote: > On 18.10.16 10:45, Dmitry A. Kazakov wrote: >> On 18/10/2016 10:23, G.B. wrote: >>> On 18.10.16 09:41, Dmitry A. Kazakov wrote: >>>> On 18/10/2016 01:25, G.B. wrote: >>>>> On 17.10.16 22:18, Lucretia wrote: >>>> >>>>> According to ISO 10646, UTF stands for UCS Transformation >>>>> Format. So, it's a format, suggesting a representation. >>>>> >>>>> On similar grounds, one could define a string subtype for >>>>> other types of objects, for example >>>>> >>>>> subtype Number_String is String; >>>> >>>> You are wrong. >>> >>> The constraints on either UTF_String or or Number_String are >>> not expressible as simple Ada subtypes. They are given by >>> description and normative reference, respectively. >> >> In the case of UTF-8 it is not a constraint. > > Not an Ada constraint, in particular insofar as UTF-8 means > a representation; > still, any UTF-8 encoded "string" of UCS objects is wellformed > and it satisfies a predicate that involves all components x, x', x'', ... > of a UTF_8_String object, by stating that if x matches 2#10......#, > then x' is such-and-such, and so on. I'm not sure this predicate > is easily stated as a stand-alone type invariant, for example, but > that's the idea. It shouldn't have to be visible to Ada programmers. Sorry, that is a meaningless set of words. Type constraint is put on type values. Values of UTF-8 strings are not values of strings, as A-umlaut promptly demonstrates. Period. >> Numeric character is a constraint expressible in Ada: >> >> subtype Numeric is Character range '0'..'9'; >> >> Numeric string constraint is not expressible, but it still a constraint. > > (Although, the Numeric_String subtype described earlier will have > a meaningless constraint on Numeric, since all remainders > are values both in base 256 and in Character. Come to think of it, > the example format is broken. #-) "Remainders are values ... in Character" makes no sense either. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de