From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!aioe.org!.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: Strange crash on custom iterator Date: Wed, 4 Jul 2018 09:21:00 +0200 Organization: Aioe.org NNTP Server Message-ID: References: <70c11a71-3832-4f57-8127-f3f1c48a052f@googlegroups.com> <62e38ee4-f72f-4ed8-bef1-952040fb7f8d@googlegroups.com> <64d8b4a1-a92c-4b90-b95c-e821749de969@googlegroups.com> <887212304.552080112.848502.laguest-archeia.com@nntp.aioe.org> <87muvan83x.fsf@adaheads.home> <1449870001.552246132.581310.laguest-archeia.com@nntp.aioe.org> <1ce9b9c8-b9cb-4ff4-b4c7-fe4827fea15b@googlegroups.com> <1f634e80-a1e7-4fb1-8cdf-5db6a773f36d@googlegroups.com> NNTP-Posting-Host: MyFhHs417jM9AgzRpXn7yg.user.gioia.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Complaints-To: abuse@aioe.org User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 X-Notice: Filtered by postfilter v. 0.8.3 Content-Language: en-US Xref: reader02.eternal-september.org comp.lang.ada:53560 Date: 2018-07-04T09:21:00+02:00 List-Id: On 2018-07-03 23:04, Lucretia wrote: > On Tuesday, 3 July 2018 21:18:28 UTC+1, Dmitry A. Kazakov wrote: > >>> Well, they kind of already did that by subtyping UTF_String from String, of which it's not a subtype, it's just they are both arrays of 8-bit entities. >> >> No. Both are arrays of code points and arrays of octets. The ranges of >> code points are different. The correspondence between code points and >> octets are different. Thus the subtyping is broken. > > I know the difference between code points and octets and their arrays. I was saying that UTF_String is not a valid subtype of String because String is Latin 1 and UTF_String is a superset of 7-bit ASCII, not 8-bit Latin 1. No, that does not break subtyping if Constraint_Error is in the contract. Subtyping is broken when the array of Latin-1 code points (String) corresponds to the array of representation units (octets of UTF8_String). Array of Latin-1 code points corresponds to the array of Unicode code points. It has nothing to do with the underlying encoding, whatever it might be. Each string implements two unrelated array interfaces: 1. Array of encoding units, e.g. array of octets 2. Array of code points #1 and #2 are historically confused because one resembles another for a certain class encodings like ASCII, UCS-2, UCS-4. They are absolutely different for UTF-8 and UTF-16. >>> Am i wrong, should I just implement what I need on top of the standard lib and just use the UTF* types in my code? What about unbounded_utf_strings? Just use the normal unbounded_string? It's not like it's going to be checking for it to be correct utf8 is it, but I can't write an iterator for that from outside the rts though. >> >> There is no way to do it right in Ada for now. > > What do you mean exactly???? For simplicity start with designing character types: Character, Wide_Character and Wide_Wide_Character as related types. X : Character; -- Character'Size = 8 Y : Wide_Character := Y; -- This must be legal Already this is impossible in Ada. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de