From: "196...@googlemail.com" <1963bib@googlemail.com>
Subject: Re: XMLAda & unicode symbols
Date: Mon, 21 Jun 2021 13:06:58 -0700 (PDT) [thread overview]
Message-ID: <7da5a442-2ad9-4bfd-9d6c-c8885da02d05n@googlegroups.com> (raw)
In-Reply-To: <8d443406-48dc-4d4e-868c-832caabebd1en@googlegroups.com>
On Monday, 21 June 2021 at 19:33:58 UTC+1, briot.e...@gmail.com wrote:
> > A scan through XML/Ada shows that the only uses of Unicode_Char are in
> > the SAX subset. I don't see any way in the DOM subset of XML/Ada of
> > using them - someone please prove me wrong!
> Those two subsets are not independent, in fact the DOM subset is entirely based on the SAX one.
> So anything that applies to SAX also applies to DOM.
>
> That said, the DOM standard (at the time I built XML/Ada, which is 20 years ago whereabouts) likely
> did not have standard functions that receives unicode characters, only strings.
> DOM implementations are free to use any internal representation they want, and I think they did not
> have to accept any random encoding. XML/Ada is not user-friendly, it really is only a fairly low-level
> implementation of the DOM standard. Using DOM without high-level things like XPath is a real
> pain. At the time, someone else had done an XPath implementation, so I never took the time to
> duplicate that effort.
>
> Conversion between various encodings (8bit, unicode utf-8, utf-16 or utf-32) is done via the
> `unicode` module of XML/Ada, namely for instance `unicode-ces-utf8.ads`. They all provide a similar API. In this case
> you want the `Encode` procedure. This is not a function (so doesn't return a Byte_Sequence directly) for efficiency
> reason, even if it would be convenient for end-users, admittedly.
>
> As someone rightly mentioned, it doesn't really make sense to use XML/Ada to build a tree in memory just for the
> sake of printing it, though. Ada.Text_IO or streams will be much much more efficient. XML/Ada is only useful
> to parse XML streams (in which case you never have to yourself encode a character to a byte sequence in
> general).
> > > we need to convert it, then let us do so outside of it.
> > That is *exactly* what you have to do (convert outside, not throw any
> > old sequence of octets and 32-bit values somehow mashed together at
> > it
> Well said Simon, thanks. Basically, the whole application should be utf-8 if you at all care about international
> characters (if you don't, feel free to use latin-1, or any encoding your terminal supports). So conversion should not
> occur just at the interface to XML/Ada, but only on input and output of your program.
> XML/Ada just assumes a string is a sequence of bytes. The actual encoding has to be known by the application,
> and be consistent.
> If for some reason (Windows ?) you prefer utf-16 internally, you can change `sax-encodings.ads` and recompile.
> (would have been neater to use generic traits packages, but I did not realize about them until a few years later).
>
> It would also have been nicer to use a string type that knows about the encoding. I wrote GNATCOLL.Strings for
> that purpose several years alter too. XML/Ada was never used extensively, so it was never a priority for AdaCore
> to update it to use all these packages, at the risk of either breaking backward compatibility, or duplicating the
> whole API to allow for the various string types. Not worth it.
>
> Emmanuel
Okay, now I think I am getting somewhere. A push and a prod is always welcome.
next prev parent reply other threads:[~2021-06-21 20:06 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-19 18:28 XMLAda & unicode symbols 196...@googlemail.com
2021-06-19 19:53 ` Jeffrey R. Carter
2021-06-20 17:02 ` 196...@googlemail.com
2021-06-20 17:23 ` Dmitry A. Kazakov
2021-06-20 17:58 ` 196...@googlemail.com
2021-06-20 18:16 ` Dmitry A. Kazakov
2021-06-21 19:40 ` 196...@googlemail.com
2021-06-21 20:18 ` Dmitry A. Kazakov
2021-06-21 15:37 ` Simon Wright
2021-06-21 19:49 ` 196...@googlemail.com
2021-06-21 20:23 ` Dmitry A. Kazakov
2021-06-21 20:47 ` Simon Wright
2021-06-22 0:30 ` Spiros Bousbouras
2021-06-20 18:21 ` Jeffrey R. Carter
2021-06-20 18:47 ` Dmitry A. Kazakov
2021-06-20 22:50 ` Jeffrey R. Carter
2021-06-21 4:16 ` Marius Amado-Alves
2021-06-21 9:39 ` Jeffrey R. Carter
2021-06-21 6:14 ` Dmitry A. Kazakov
2021-06-19 21:24 ` Simon Wright
2021-06-20 17:10 ` 196...@googlemail.com
2021-06-21 15:26 ` Simon Wright
2021-06-21 18:33 ` Emmanuel Briot
2021-06-21 20:06 ` 196...@googlemail.com [this message]
2021-06-21 21:26 ` Simon Wright
2021-06-22 6:52 ` Emmanuel Briot
2021-06-21 21:22 ` Simon Wright
2021-06-21 6:07 ` Vadim Godunko
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox