From: Georg Bauhaus <sb463ba@l1-hrz.uni-duisburg.de>
Subject: Re: wide_string and assertions
Date: Sat, 5 Jun 2004 11:37:53 +0000 (UTC)
Date: 2004-06-05T11:37:53+00:00 [thread overview]
Message-ID: <c9sbah$abt$1@a1-hrz.uni-duisburg.de> (raw)
In-Reply-To: 1224046.crrTJmpIeA@linux1.krischik.com
Martin Krischik <krischik@users.sourceforge.net> wrote:
>
:> (It would be great if someone who know about surrogate characters could
:> check the following. Martin, is this similar to your function?)
:
: As I said the functions are based on the Unicode part of XML/Ada. Not point
: reinventing the Wheel.
I think I haven't actually, as the wheels are slightly different ;-)
There is also a small difference from how the same functionality
in available in GNAT, see below.
I want to stay within standard Ada 95 in as many parts of the
program as possible. UTF-8 is so general, shouldn't support be
generally available, as Randy said. (The whole Mac OS X is all UTF-8,
and so was Plan 9, even before the rise of XML.)
Now there is support for more than one XML library in my program.
Using either library's To_UTF_8 function, if available at all,
might introduce yet another issue to keep track of; they might have
differing semantics (raising exceptions or not, ...).
For example, how does XML/Ada treat Unicode characters in the range
16#D800# .. 16#DFFF#? AFAICS, they are just treated like the other
Unicode_Char values.
(If I knew the details of current Unicode and ISO 10646 I could
possibly answer my question: Will this work without surprises
in likely environments now and in the future (Mac, xterm -u8,
cmd.exe /u, ... or Unicode UI-windows)?
: function To_UTF8 (
: Str : in Wide_String)
: return
: String;
Is this available yet? (I guess it will extend Wide_Character values
into XML/Ada's mod 2**32 Unicode_Char and then pass the result through
the substitution chain?)
:> function to_UTF8_String
:> (s: Wide_String; substitute: Character := '#') return String is
:
: No substitution. An exception is raised if it does not fit. If this is a
: proble you might be better of raising a Utf32_String.
It is a problem as the UTF-8 strings are used in exception messages.
Raising an exception in To_UTF8 when raising another exception via
Raise_Exception (excpt'Identity, To_UTF8 (msg));
does not seem helpful here?
:> result: String(1.. 4 * s'length);
:> -- Unicode has at most 4 bytes for a UTF-8 encoded character
:
: XML/Ada encodes full 32bit that is up to 6 UTF-8 bytes. In an other
: discussion we allready figured out that this is not realy needed.
Yes.
-- Georg
next prev parent reply other threads:[~2004-06-05 11:37 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-06-03 16:23 wide_string and assertions Georg Bauhaus
2004-06-04 3:37 ` Randy Brukardt
2004-06-04 8:49 ` Martin Krischik
2004-06-05 8:42 ` Pascal Obry
2004-06-05 17:15 ` Martin Krischik
2004-06-04 17:48 ` Georg Bauhaus
2004-06-05 7:10 ` Martin Krischik
2004-06-05 11:37 ` Georg Bauhaus [this message]
2004-06-05 17:11 ` Martin Krischik
2004-06-05 18:41 ` Björn Persson
2004-06-08 16:41 ` Georg Bauhaus
2004-06-09 13:19 ` Björn Persson
2004-06-09 15:03 ` Georg Bauhaus
2004-06-09 15:26 ` Björn Persson
2004-06-10 12:25 ` Georg Bauhaus
2004-06-10 13:30 ` Björn Persson
2004-06-05 12:32 ` China Björn Persson
2004-06-05 16:49 ` China, character sets Georg Bauhaus
2004-06-05 21:50 ` China Alexander E. Kopilovich
2004-06-04 20:42 ` wide_string and assertions Nick Roberts
2004-06-06 13:23 ` Björn Persson
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox