From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,2afae4a128914036
X-Google-Attributes: gid103376,public
Path: 
 g2news1.google.com!news1.google.com!news.glorb.com!news2.telebyte.nl!news-fra1.dfn.de!news-ham1.dfn.de!news.uni-hamburg.de!cs.tu-berlin.de!uni-duisburg.de!not-for-mail
From: Georg Bauhaus <sb463ba@l1-hrz.uni-duisburg.de>
Newsgroups: comp.lang.ada
Subject: Re: wide_string and assertions
Date: Fri, 4 Jun 2004 17:48:04 +0000 (UTC)
Organization: GMUGHDU
Message-ID: <c9qckk$agj$1@a1-hrz.uni-duisburg.de>
References: <c9nj9l$cif$1@a1-hrz.uni-duisburg.de>
 <47SdnXI-D-3icyLdRVn-uQ@megapath.net>
NNTP-Posting-Host: l1-hrz.uni-duisburg.de
X-Trace: a1-hrz.uni-duisburg.de 1086371284 10771 134.91.1.34 (4 Jun 2004
 17:48:04 GMT)
X-Complaints-To: usenet@news.uni-duisburg.de
NNTP-Posting-Date: Fri, 4 Jun 2004 17:48:04 +0000 (UTC)
User-Agent: tin/1.5.8-20010221 ("Blue Water") (UNIX) (HP-UX/B.11.00
 (9000/800))
Xref: g2news1.google.com comp.lang.ada:1101
Date: 2004-06-04T17:48:04+00:00
List-Id: <comp.lang.ada>

Randy Brukardt <randy@rrsoftware.com> wrote:
: "Georg Bauhaus" <sb463ba@l1-hrz.uni-duisburg.de> wrote in message

:> In an assert(x, y), y has to be static. So I guess for y I
:> will have to play tricks and use UTF-8 coding of static
:> String values?
: 
: Yes.

OK. (Though I find it a bit inconsistent from some
point of view not to be able to express failure descriptions
in a language that is understood by local operators.)

: As far as demand goes, you're the first person to mention it to my
: knowledge -- which is suggests that the demand is low. :-)

How do that do these things in China? Does anyone have some
experience?
 
:    raise Assert_Error with To_UTF_8 ("Wide_Wide_String");
: so that might be preferable for this purpose.

Yes. This is what I'm currently doing with Wide_String and
Raise_Exception.


: (But you'll probably have to
: write the function yourself; there doesn't seem to be much support for
: including such functions in the Standard.)

(It would be great if someone who know about surrogate characters could
check the following. Martin, is this similar to your function?)

with Interfaces; use Interfaces;

package body support.utils is


   -- ----------------
   -- to_UTF8_String
   -- ----------------

   -- s UTF-8 encoded. '#' is a substitute for surrogate characters

   function to_UTF8_String
     (s: Wide_String; substitute: Character := '#') return String is

      result: String(1.. 4 * s'length);
      --  Unicode has at most 4 bytes for a UTF-8 encoded character

      k: Positive range result'range := result'first;
      --  in the loop, points to the first insertion position of a
      -- "byte sequence".

      bits: Unsigned_32 := 2#0#;
      --  the bits representing the Wide_Character

      subtype Ch is Character;
      --  abbreviation

      B6: constant := 2#111111#;

   begin

      for j in s'range loop
         bits := Wide_Character'pos(s(j));  -- XXX rename instead?


         if bits <= 2#1111111# then
            result(k) := Ch'Val(bits);
            k := k + 1;

         elsif bits <= 2#11111_111111# then
            result(k .. k + 1) :=
              (Ch'val(2#110_00000# or (shift_right(bits, 1 * 6) and 2#11111#)),
               Ch'val(2#10_000000# or (shift_right(bits, 0 * 6) and B6)));
            k := k + 2;

         elsif bits in 16#d800# .. 16#dfff# then
            -- XXX surrogate characters vs UTF
            result(k) := substitute;
            k := k + 1;

         elsif
           bits = 16#fffe#
           or bits = 16#ffff#
         then
            -- ignore non-characters
            null;

         elsif bits <= 2#1111_111111_111111# then
            result(k .. k + 2) :=
              (Ch'val(2#1110_0000# or (shift_right(bits, 2 * 6) and 2#1111#)),
               Ch'val(2#10_000000# or (shift_right(bits, 1 * 6) and B6)),
               Ch'val(2#10_000000# or (shift_right(bits, 0 * 6) and B6)));
            k := k + 3;

         elsif bits <= 2#111_111111_111111_111111# then
            result(k .. k + 3) :=
              (Ch'val(2#11110_000# or (shift_right(bits, 3 * 6) and 2#111#)),
               Ch'val(2#10_000000# or (shift_right(bits, 2 * 6) and B6)),
               Ch'val(2#10_000000# or (shift_right(bits, 1 * 6) and B6)),
               Ch'val(2#10_000000# or (shift_right(bits, 0 * 6) and B6)));
            k := k + 4;

         else
            -- not Unicode
            raise Constraint_Error;

         end if;
      end loop;

      return result(1.. k - 1);

   end to_UTF8_String;


end support.utils;