From: "G.B." <bauhaus@futureapps.invalid>
Subject: Re: Bug in Ada - Latin 1 is not a subset of UTF-8
Date: Fri, 21 Oct 2016 14:28:51 +0200
Date: 2016-10-21T14:28:51+02:00 [thread overview]
Message-ID: <nud1le$6is$1@dont-email.me> (raw)
In-Reply-To: <nu9s5v$18f0$1@gioia.aioe.org>
On 20.10.16 09:36, Dmitry A. Kazakov wrote:
> On 20/10/2016 02:31, Randy Brukardt wrote:
>> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message
>> news:nu4nee$18le$1@gioia.aioe.org...
>> ...
>>> Numeric character is a constraint expressible in Ada:
>>>
>>> subtype Numeric is Character range '0'..'9';
>>>
>>> Numeric string constraint is not expressible, but it still a constraint.
>>
>> It's expressible as a predicate, though; that's the entire point of
>> predicates (to act like user-defined constraints):
>>
>> subtype Numeric_String is String
>> with Dynamic_Predicate => (for all E of Numeric_String => E in
>> Numeric);
>>
>> It's not 100% as good as a constraint (as modifications of individual
>> components won't be checked), but it almost always will do the job.
>
> Not nice. Is there a reason why, apart from premature optimization?
I think you can add an aspect to the component type
and have that checked on assignment to a component.
The aspect could somehow be different from the
constraint, also just repeating it appears to loop infinitely
with current GNATs.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78066
Anyway, a little inconvenience for starters:
subtype My_Utf_8_String is String
-- or, when not String, some array of any component type
-- suitable as a byte sequence item type
with Dynamic_Predicate => Is_Well_Formed (My_Utf_8_String);
Bom: constant String := String'(Character'Val (16#EF#),
Character'Val (16#BB#),
Character'Val (16#BF#));
function Has_Bom (U8: String) return Boolean is
(U8'Length >= 3
and then U8 (U8'First .. U8'First + 2) = Bom);
function "abs" is new Ada.Unchecked_Conversion
(Character, Interfaces.Unsigned_8);
function Is_Well_Formed (U8 : String) return Boolean is
-- `U8` has permissible bit patterns for all bytes. (No Table 3.7
-- support.)
((if U8'Length > 0 then
(if Has_Bom (U8)
then
Is_Well_Formed (U8 (U8'First + 3 .. U8'Last))
else
(for all J in U8'Range =>
(case abs U8 (J) is
when 2#0_0000000# .. 2#0_1111111# =>
-- ASCII compatibility
True,
when 2#10_000000# .. 2#10_111111# =>
-- is a following byte
(if J > U8'First then
(abs U8 (J - 1)
in 2#110_00000# .. 2#110_11111#
or abs U8 (J - 1)
in 2#1110_0000# .. 2#1110_1111#
or abs U8 (J - 1)
in 2#11110_000# .. 2#11110_111#)
else
False
),
when 2#110_00000# .. 2#110_11111# =>
(if J < U8'Last then
(abs U8 (J + 1)
in 2#10_000000# .. 2#10_111111#)
else
False),
when 2#1110_0000# .. 2#1110_1111# =>
(if J + 1 < U8'Last then
(for all K in J + 1 .. J + 2 =>
abs U8 (K)
in 2#10_000000# .. 2#10_111111#)
else
False
),
when 2#11110_000# .. 2#11110_111# =>
(if J + 2 < U8'Last then
(for all K in J + 1 .. J + 3 =>
abs U8 (K)
in 2#10_000000# .. 2#10_111111#)
else
False
),
when 2#11111_000# .. 2#11111_111# =>
-- not in Table 3.6 (UTF-8 Bit Distribution)
False
)
)
)
-- String of length 0:
else True));
Test_Bom : constant My_Utf_8_String := Bom & "ABC";
Test_US : constant My_Utf_8_String := "ABC";
Test_GR : constant My_Utf_8_String := "ΑΒΓ";
Test_RU : constant My_Utf_8_String := "АБГ";
Test_Xx : constant My_Utf_8_String :=
('A', Character'Val (16#E4#), 'E');
next prev parent reply other threads:[~2016-10-21 12:28 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-17 20:18 Bug in Ada - Latin 1 is not a subset of UTF-8 Lucretia
2016-10-17 20:57 ` Jacob Sparre Andersen
2016-10-18 5:44 ` J-P. Rosen
2016-10-17 23:25 ` G.B.
2016-10-18 7:41 ` Dmitry A. Kazakov
2016-10-18 8:23 ` G.B.
2016-10-18 8:45 ` Dmitry A. Kazakov
2016-10-18 10:09 ` G.B.
2016-10-18 12:24 ` Dmitry A. Kazakov
2016-10-18 15:10 ` G.B.
2016-10-18 16:35 ` Dmitry A. Kazakov
2016-10-18 17:35 ` G.B.
2016-10-18 20:03 ` Dmitry A. Kazakov
2016-10-19 8:15 ` G.B.
2016-10-19 8:25 ` G.B.
2016-10-19 8:49 ` Dmitry A. Kazakov
2016-10-19 14:20 ` G.B.
2016-10-19 16:20 ` Dmitry A. Kazakov
2016-10-20 0:31 ` Randy Brukardt
2016-10-20 7:36 ` Dmitry A. Kazakov
2016-10-21 12:28 ` G.B. [this message]
2016-10-21 16:13 ` Lucretia
2016-10-21 16:43 ` Dmitry A. Kazakov
2016-10-22 5:51 ` G.B.
2016-10-22 7:49 ` Dmitry A. Kazakov
2016-10-24 11:35 ` Luke A. Guest
2016-10-24 13:01 ` Dmitry A. Kazakov
2016-10-24 14:54 ` Luke A. Guest
2016-10-22 1:53 ` Randy Brukardt
2016-10-28 21:08 ` Shark8
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox