From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,2afae4a128914036 X-Google-Attributes: gid103376,public Path: g2news1.google.com!news1.google.com!news.glorb.com!news2.telebyte.nl!news-fra1.dfn.de!news-ham1.dfn.de!news.uni-hamburg.de!cs.tu-berlin.de!uni-duisburg.de!not-for-mail From: Georg Bauhaus Newsgroups: comp.lang.ada Subject: Re: wide_string and assertions Date: Fri, 4 Jun 2004 17:48:04 +0000 (UTC) Organization: GMUGHDU Message-ID: References: <47SdnXI-D-3icyLdRVn-uQ@megapath.net> NNTP-Posting-Host: l1-hrz.uni-duisburg.de X-Trace: a1-hrz.uni-duisburg.de 1086371284 10771 134.91.1.34 (4 Jun 2004 17:48:04 GMT) X-Complaints-To: usenet@news.uni-duisburg.de NNTP-Posting-Date: Fri, 4 Jun 2004 17:48:04 +0000 (UTC) User-Agent: tin/1.5.8-20010221 ("Blue Water") (UNIX) (HP-UX/B.11.00 (9000/800)) Xref: g2news1.google.com comp.lang.ada:1101 Date: 2004-06-04T17:48:04+00:00 List-Id: Randy Brukardt wrote: : "Georg Bauhaus" wrote in message :> In an assert(x, y), y has to be static. So I guess for y I :> will have to play tricks and use UTF-8 coding of static :> String values? : : Yes. OK. (Though I find it a bit inconsistent from some point of view not to be able to express failure descriptions in a language that is understood by local operators.) : As far as demand goes, you're the first person to mention it to my : knowledge -- which is suggests that the demand is low. :-) How do that do these things in China? Does anyone have some experience? : raise Assert_Error with To_UTF_8 ("Wide_Wide_String"); : so that might be preferable for this purpose. Yes. This is what I'm currently doing with Wide_String and Raise_Exception. : (But you'll probably have to : write the function yourself; there doesn't seem to be much support for : including such functions in the Standard.) (It would be great if someone who know about surrogate characters could check the following. Martin, is this similar to your function?) with Interfaces; use Interfaces; package body support.utils is -- ---------------- -- to_UTF8_String -- ---------------- -- s UTF-8 encoded. '#' is a substitute for surrogate characters function to_UTF8_String (s: Wide_String; substitute: Character := '#') return String is result: String(1.. 4 * s'length); -- Unicode has at most 4 bytes for a UTF-8 encoded character k: Positive range result'range := result'first; -- in the loop, points to the first insertion position of a -- "byte sequence". bits: Unsigned_32 := 2#0#; -- the bits representing the Wide_Character subtype Ch is Character; -- abbreviation B6: constant := 2#111111#; begin for j in s'range loop bits := Wide_Character'pos(s(j)); -- XXX rename instead? if bits <= 2#1111111# then result(k) := Ch'Val(bits); k := k + 1; elsif bits <= 2#11111_111111# then result(k .. k + 1) := (Ch'val(2#110_00000# or (shift_right(bits, 1 * 6) and 2#11111#)), Ch'val(2#10_000000# or (shift_right(bits, 0 * 6) and B6))); k := k + 2; elsif bits in 16#d800# .. 16#dfff# then -- XXX surrogate characters vs UTF result(k) := substitute; k := k + 1; elsif bits = 16#fffe# or bits = 16#ffff# then -- ignore non-characters null; elsif bits <= 2#1111_111111_111111# then result(k .. k + 2) := (Ch'val(2#1110_0000# or (shift_right(bits, 2 * 6) and 2#1111#)), Ch'val(2#10_000000# or (shift_right(bits, 1 * 6) and B6)), Ch'val(2#10_000000# or (shift_right(bits, 0 * 6) and B6))); k := k + 3; elsif bits <= 2#111_111111_111111_111111# then result(k .. k + 3) := (Ch'val(2#11110_000# or (shift_right(bits, 3 * 6) and 2#111#)), Ch'val(2#10_000000# or (shift_right(bits, 2 * 6) and B6)), Ch'val(2#10_000000# or (shift_right(bits, 1 * 6) and B6)), Ch'val(2#10_000000# or (shift_right(bits, 0 * 6) and B6))); k := k + 4; else -- not Unicode raise Constraint_Error; end if; end loop; return result(1.. k - 1); end to_UTF8_String; end support.utils;