From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,2afae4a128914036 X-Google-Attributes: gid103376,public Path: g2news1.google.com!news2.google.com!newsfeed2.dallas1.level3.net!news.level3.com!newsfeed.mathworks.com!news.tele.dk!news.tele.dk!small.news.tele.dk!news-fra1.dfn.de!news-ham1.dfn.de!news.uni-hamburg.de!cs.tu-berlin.de!uni-duisburg.de!not-for-mail From: Georg Bauhaus Newsgroups: comp.lang.ada Subject: Re: wide_string and assertions Date: Sat, 5 Jun 2004 11:37:53 +0000 (UTC) Organization: GMUGHDU Message-ID: References: <47SdnXI-D-3icyLdRVn-uQ@megapath.net> <1224046.crrTJmpIeA@linux1.krischik.com> NNTP-Posting-Host: l1-hrz.uni-duisburg.de X-Trace: a1-hrz.uni-duisburg.de 1086435473 10621 134.91.1.34 (5 Jun 2004 11:37:53 GMT) X-Complaints-To: usenet@news.uni-duisburg.de NNTP-Posting-Date: Sat, 5 Jun 2004 11:37:53 +0000 (UTC) User-Agent: tin/1.5.8-20010221 ("Blue Water") (UNIX) (HP-UX/B.11.00 (9000/800)) Xref: g2news1.google.com comp.lang.ada:1121 Date: 2004-06-05T11:37:53+00:00 List-Id: Martin Krischik wrote: > :> (It would be great if someone who know about surrogate characters could :> check the following. Martin, is this similar to your function?) : : As I said the functions are based on the Unicode part of XML/Ada. Not point : reinventing the Wheel. I think I haven't actually, as the wheels are slightly different ;-) There is also a small difference from how the same functionality in available in GNAT, see below. I want to stay within standard Ada 95 in as many parts of the program as possible. UTF-8 is so general, shouldn't support be generally available, as Randy said. (The whole Mac OS X is all UTF-8, and so was Plan 9, even before the rise of XML.) Now there is support for more than one XML library in my program. Using either library's To_UTF_8 function, if available at all, might introduce yet another issue to keep track of; they might have differing semantics (raising exceptions or not, ...). For example, how does XML/Ada treat Unicode characters in the range 16#D800# .. 16#DFFF#? AFAICS, they are just treated like the other Unicode_Char values. (If I knew the details of current Unicode and ISO 10646 I could possibly answer my question: Will this work without surprises in likely environments now and in the future (Mac, xterm -u8, cmd.exe /u, ... or Unicode UI-windows)? : function To_UTF8 ( : Str : in Wide_String) : return : String; Is this available yet? (I guess it will extend Wide_Character values into XML/Ada's mod 2**32 Unicode_Char and then pass the result through the substitution chain?) :> function to_UTF8_String :> (s: Wide_String; substitute: Character := '#') return String is : : No substitution. An exception is raised if it does not fit. If this is a : proble you might be better of raising a Utf32_String. It is a problem as the UTF-8 strings are used in exception messages. Raising an exception in To_UTF8 when raising another exception via Raise_Exception (excpt'Identity, To_UTF8 (msg)); does not seem helpful here? :> result: String(1.. 4 * s'length); :> -- Unicode has at most 4 bytes for a UTF-8 encoded character : : XML/Ada encodes full 32bit that is up to 6 UTF-8 bytes. In an other : discussion we allready figured out that this is not realy needed. Yes. -- Georg