From: Georg Bauhaus <sb463ba@l1-hrz.uni-duisburg.de>
Subject: Re: wide_string and assertions
Date: Fri, 4 Jun 2004 17:48:04 +0000 (UTC)
Date: 2004-06-04T17:48:04+00:00 [thread overview]
Message-ID: <c9qckk$agj$1@a1-hrz.uni-duisburg.de> (raw)
In-Reply-To: 47SdnXI-D-3icyLdRVn-uQ@megapath.net
Randy Brukardt <randy@rrsoftware.com> wrote:
: "Georg Bauhaus" <sb463ba@l1-hrz.uni-duisburg.de> wrote in message
:> In an assert(x, y), y has to be static. So I guess for y I
:> will have to play tricks and use UTF-8 coding of static
:> String values?
:
: Yes.
OK. (Though I find it a bit inconsistent from some
point of view not to be able to express failure descriptions
in a language that is understood by local operators.)
: As far as demand goes, you're the first person to mention it to my
: knowledge -- which is suggests that the demand is low. :-)
How do that do these things in China? Does anyone have some
experience?
: raise Assert_Error with To_UTF_8 ("Wide_Wide_String");
: so that might be preferable for this purpose.
Yes. This is what I'm currently doing with Wide_String and
Raise_Exception.
: (But you'll probably have to
: write the function yourself; there doesn't seem to be much support for
: including such functions in the Standard.)
(It would be great if someone who know about surrogate characters could
check the following. Martin, is this similar to your function?)
with Interfaces; use Interfaces;
package body support.utils is
-- ----------------
-- to_UTF8_String
-- ----------------
-- s UTF-8 encoded. '#' is a substitute for surrogate characters
function to_UTF8_String
(s: Wide_String; substitute: Character := '#') return String is
result: String(1.. 4 * s'length);
-- Unicode has at most 4 bytes for a UTF-8 encoded character
k: Positive range result'range := result'first;
-- in the loop, points to the first insertion position of a
-- "byte sequence".
bits: Unsigned_32 := 2#0#;
-- the bits representing the Wide_Character
subtype Ch is Character;
-- abbreviation
B6: constant := 2#111111#;
begin
for j in s'range loop
bits := Wide_Character'pos(s(j)); -- XXX rename instead?
if bits <= 2#1111111# then
result(k) := Ch'Val(bits);
k := k + 1;
elsif bits <= 2#11111_111111# then
result(k .. k + 1) :=
(Ch'val(2#110_00000# or (shift_right(bits, 1 * 6) and 2#11111#)),
Ch'val(2#10_000000# or (shift_right(bits, 0 * 6) and B6)));
k := k + 2;
elsif bits in 16#d800# .. 16#dfff# then
-- XXX surrogate characters vs UTF
result(k) := substitute;
k := k + 1;
elsif
bits = 16#fffe#
or bits = 16#ffff#
then
-- ignore non-characters
null;
elsif bits <= 2#1111_111111_111111# then
result(k .. k + 2) :=
(Ch'val(2#1110_0000# or (shift_right(bits, 2 * 6) and 2#1111#)),
Ch'val(2#10_000000# or (shift_right(bits, 1 * 6) and B6)),
Ch'val(2#10_000000# or (shift_right(bits, 0 * 6) and B6)));
k := k + 3;
elsif bits <= 2#111_111111_111111_111111# then
result(k .. k + 3) :=
(Ch'val(2#11110_000# or (shift_right(bits, 3 * 6) and 2#111#)),
Ch'val(2#10_000000# or (shift_right(bits, 2 * 6) and B6)),
Ch'val(2#10_000000# or (shift_right(bits, 1 * 6) and B6)),
Ch'val(2#10_000000# or (shift_right(bits, 0 * 6) and B6)));
k := k + 4;
else
-- not Unicode
raise Constraint_Error;
end if;
end loop;
return result(1.. k - 1);
end to_UTF8_String;
end support.utils;
next prev parent reply other threads:[~2004-06-04 17:48 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-06-03 16:23 wide_string and assertions Georg Bauhaus
2004-06-04 3:37 ` Randy Brukardt
2004-06-04 8:49 ` Martin Krischik
2004-06-05 8:42 ` Pascal Obry
2004-06-05 17:15 ` Martin Krischik
2004-06-04 17:48 ` Georg Bauhaus [this message]
2004-06-05 7:10 ` Martin Krischik
2004-06-05 11:37 ` Georg Bauhaus
2004-06-05 17:11 ` Martin Krischik
2004-06-05 18:41 ` Björn Persson
2004-06-08 16:41 ` Georg Bauhaus
2004-06-09 13:19 ` Björn Persson
2004-06-09 15:03 ` Georg Bauhaus
2004-06-09 15:26 ` Björn Persson
2004-06-10 12:25 ` Georg Bauhaus
2004-06-10 13:30 ` Björn Persson
2004-06-05 12:32 ` China Björn Persson
2004-06-05 16:49 ` China, character sets Georg Bauhaus
2004-06-05 21:50 ` China Alexander E. Kopilovich
2004-06-04 20:42 ` wide_string and assertions Nick Roberts
2004-06-06 13:23 ` Björn Persson
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox