comp.lang.ada
 help / color / mirror / Atom feed
* xml/ada dropping data when pre-defined entities are separated by space?
@ 2011-01-27 10:06 björn lundin
  2011-01-27 11:12 ` Georg Bauhaus
  0 siblings, 1 reply; 12+ messages in thread
From: björn lundin @ 2011-01-27 10:06 UTC (permalink / raw)


Hi !
I wonder if anyone has encountered difficulties with xml/ada
when parsing documents containing the pre-defined entities like &

When having one of these, it is ok, but if two appear, separated with
space,
the rest of the text is dropped.

This is with xml/ada 3.2.1 gpl, dowloaded 27-jan-2011 from libre of
act


in the output below, I'd like to se ' & DEF' as well on the first
row...


with Input_Sources.Strings; use Input_Sources.Strings;
with Sax.Readers;           use Sax.Readers;
with DOM.Readers;           use DOM.Readers;
with DOM.Core;              use DOM.Core;
with DOM.Core.Documents;    use DOM.Core.Documents;
with DOM.Core.Nodes;        use DOM.Core.Nodes;
with Ada.Text_IO;
with Unicode.CES.Basic_8bit;

procedure Test_Xml_Dom is
    Xml_Str : String := "<?xml version='1.0' encoding='ISO-8859-1'?
><env><Row>ABC &amp; &amp; DEF</Row><Row>ABC &amp;&amp; DEF</
Row><Row>ABC _ DEF</Row></env>";
    Input   : String_Input;
    Reader  : Tree_Reader;
    Doc     : Document;
    List    : Node_List;
    N       : Node;
begin
    Open(Xml_Str, Unicode.CES.Basic_8bit.Basic_8bit_Encoding,Input);
    Reader.Set_Feature(Validation_Feature, False);
    Reader.Set_Feature(Namespace_Feature, False);
    Reader.Parse(Input);
    Input.Close;
    Doc := Reader.Get_Tree;
    List := Get_Elements_By_Tag_Name(Doc, "Row");
    for Index in 1 .. Length (List) loop
      N := Item(List, Index - 1);
      Ada.Text_IO.Put_Line("Value of Row is: " &  Node_Value
(First_Child (N)));
    end loop;
end Test_Xml_Dom;

testrun:
c:\>test_xml_dom.exe
Value of Row is: ABC &
Value of Row is: ABC && DEF
Value of Row is: ABC _ DEF


/Björn



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: xml/ada dropping data when pre-defined entities are separated by space?
  2011-01-27 10:06 xml/ada dropping data when pre-defined entities are separated by space? björn lundin
@ 2011-01-27 11:12 ` Georg Bauhaus
  2011-01-29 15:36   ` björn lundin
  0 siblings, 1 reply; 12+ messages in thread
From: Georg Bauhaus @ 2011-01-27 11:12 UTC (permalink / raw)


On 27.01.11 11:06, bj�rn lundin wrote:

> testrun:
> c:\>test_xml_dom.exe
> Value of Row is: ABC &
> Value of Row is: ABC && DEF
> Value of Row is: ABC _ DEF

The same result on Debian stable and Debian testing.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: xml/ada dropping data when pre-defined entities are separated by space?
  2011-01-27 11:12 ` Georg Bauhaus
@ 2011-01-29 15:36   ` björn lundin
  2011-01-29 17:13     ` Simon Wright
  0 siblings, 1 reply; 12+ messages in thread
From: björn lundin @ 2011-01-29 15:36 UTC (permalink / raw)


On 27 Jan, 12:12, Georg Bauhaus <rm.dash-bauh...@futureapps.de> wrote:
> On27.01.11 11:06, bj rn lundin wrote:
>
> > testrun:
> > c:\>test_xml_dom.exe
> > Value of Row is: ABC &
> > Value of Row is: ABC && DEF
> > Value of Row is: ABC _ DEF
>
> The same result on Debian stable and Debian testing.

Funny thing is that if i dump the dom-tree with print,
The serialized stream have both amps there plus whatever text follows
In that text node.

If i parse that xml with a sax parser, it triggs several charachter
events when
The amps are there, but only one character event when the text node
contains ordinary
 text only.
I did look in the sources, and it seems like the dom parser handles
the multiple character
events generated by the sax parser well, but still, i don't see
them in the nodevalue function.
(i get the impression that the dom tree is built with a sax parser)

But since i see them when using print, i wonder
If my code is faulty.


I mean, should i look for more child elements, and if so, when sholud
i stop?

/björn





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: xml/ada dropping data when pre-defined entities are separated by space?
  2011-01-29 15:36   ` björn lundin
@ 2011-01-29 17:13     ` Simon Wright
  2011-01-29 21:49       ` björn lundin
  0 siblings, 1 reply; 12+ messages in thread
From: Simon Wright @ 2011-01-29 17:13 UTC (permalink / raw)


björn lundin <b.f.lundin@gmail.com> writes:

> I mean, should i look for more child elements, and if so, when sholud
> i stop?

There are in fact 2 nodes in the first line.

Dom.Core.Nodes.Child_Nodes returns a Node_List; look in Dom.Core.Nodes
for the Node_List operations Item and Length.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: xml/ada dropping data when pre-defined entities are separated by space?
  2011-01-29 17:13     ` Simon Wright
@ 2011-01-29 21:49       ` björn lundin
  2011-01-29 22:24         ` Simon Wright
  2011-01-31  0:22         ` Georg Bauhaus
  0 siblings, 2 replies; 12+ messages in thread
From: björn lundin @ 2011-01-29 21:49 UTC (permalink / raw)


On 29 Jan, 18:13, Simon Wright <si...@pushface.org> wrote:

> There are in fact 2 nodes in the first line.
>
> Dom.Core.Nodes.Child_Nodes returns a Node_List; look in Dom.Core.Nodes
> for the Node_List operations Item and Length.

Yes, there it is:-)
Thanks for the pointer, now i get more what i like


with Input_Sources.Strings; use Input_Sources.Strings;
with Sax.Readers;           use Sax.Readers;
with DOM.Readers;           use DOM.Readers;
with DOM.Core;              use DOM.Core;
with DOM.Core.Documents;    use DOM.Core.Documents;
with DOM.Core.Nodes;        use DOM.Core.Nodes;
with Ada.Text_IO;
with Unicode.CES.Basic_8bit;

procedure Test_Xml_Dom is
    Xml_Str1 : String := "<?xml version='1.0' encoding='ISO-8859-1'?
><env><Row>";
    xml_str2 : String := "ABC &amp; &amp; DEF</Row><Row>ABC &amp;&amp;
DEF</Row><Row>ABC _ DEF</Row></env>";
    Xml_Str   : String := xml_str1 & Xml_Str2;
    Input   : String_Input;
    Reader  : Tree_Reader;
    Doc     : Document;
    List,cl    : Node_List;
    N,c       : Node;
begin
    Open(Xml_Str, Unicode.CES.Basic_8bit.Basic_8bit_Encoding,Input);
    Reader.Set_Feature(Validation_Feature, False);
    Reader.Set_Feature(Namespace_Feature, False);
    Reader.Parse(Input);
    Input.Close;
    Doc := Reader.Get_Tree;
    List := Get_Elements_By_Tag_Name(Doc, "Row");
    for Index in 1 .. Length (List) loop
      N := Item(List, Index - 1);
      cl:=child_nodes(N);
      Ada.Text_IO.Put_Line("Value of Row is: " &  Node_Value
(First_Child (N)));
      for i in 1 .. length(cl) loop
        Ada.Text_IO.Put_Line("Value of node is: " &  Node_Value
(item(cl,i-1)));
      end loop;
    end loop;
end Test_Xml_Dom;

gives

bnl@ubuntu-virtual:~$ ./test_xml
Value of Row is: ABC &
Value of node is: ABC &
Value of node is:  & DEF
Value of Row is: ABC && DEF
Value of node is: ABC && DEF
Value of Row is: ABC _ DEF
Value of node is: ABC _ DEF

/Björn



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: xml/ada dropping data when pre-defined entities are separated by space?
  2011-01-29 21:49       ` björn lundin
@ 2011-01-29 22:24         ` Simon Wright
  2011-01-31  0:22         ` Georg Bauhaus
  1 sibling, 0 replies; 12+ messages in thread
From: Simon Wright @ 2011-01-29 22:24 UTC (permalink / raw)


björn lundin <b.f.lundin@gmail.com> writes:

> On 29 Jan, 18:13, Simon Wright <si...@pushface.org> wrote:
>
>> There are in fact 2 nodes in the first line.
>>
>> Dom.Core.Nodes.Child_Nodes returns a Node_List; look in Dom.Core.Nodes
>> for the Node_List operations Item and Length.
>
> Yes, there it is:-)
> Thanks for the pointer, now i get more what i like

Good!

>     Xml_Str1 : String := "<?xml version='1.0' encoding='ISO-8859-1'?>
> <env><Row>";
>     xml_str2 : String := "ABC &amp; &amp; DEF</Row><Row>ABC &amp;&amp;
> DEF</Row><Row>ABC _ DEF</Row></env>";
>     Xml_Str   : String := xml_str1 & Xml_Str2;

Just a thought, this style might make things easier to read/get right...

   Xml_Str : constant String := 
     "<?xml version='1.0' encoding='ISO-8859-1'?>"
     & "<env>"
     & "<Row>ABC &amp; &amp; DEF</Row>"
     & "<Row>ABC &amp;&amp; DEF</Row>"
     & "<Row>ABC _ DEF</Row>"
     & "</env>";



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: xml/ada dropping data when pre-defined entities are separated by space?
  2011-01-29 21:49       ` björn lundin
  2011-01-29 22:24         ` Simon Wright
@ 2011-01-31  0:22         ` Georg Bauhaus
  2011-01-31  8:04           ` björn lundin
  1 sibling, 1 reply; 12+ messages in thread
From: Georg Bauhaus @ 2011-01-31  0:22 UTC (permalink / raw)


On 1/29/11 10:49 PM, bj�rn lundin wrote:

> bnl@ubuntu-virtual:~$ ./test_xml
> Value of Row is: ABC&
> Value of node is: ABC&
> Value of node is:&  DEF

Does XML/Ada's Node have a Normalize op? (IIUC, normalize
will merge adjacent Text nodes.)  Or does Document have
a normalization op for the entire document, maybe?





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: xml/ada dropping data when pre-defined entities are separated by space?
  2011-01-31  0:22         ` Georg Bauhaus
@ 2011-01-31  8:04           ` björn lundin
  2011-01-31 14:56             ` Emmanuel Briot
  0 siblings, 1 reply; 12+ messages in thread
From: björn lundin @ 2011-01-31  8:04 UTC (permalink / raw)


On 31 Jan, 01:22, Georg Bauhaus <rm-host.bauh...@maps.futureapps.de>
wrote:
> On 1/29/11 10:49 PM, bj rn lundin wrote:
>
> > bnl@ubuntu-virtual:~$ ./test_xml
> > Value of Row is: ABC&
> > Value of node is: ABC&
> > Value of node is:&  DEF
>
> Does XML/Ada's Node have a Normalize op? (IIUC, normalize
> will merge adjacent Text nodes.)  Or does Document have
> a normalization op for the entire document, maybe?

I was just going to ask what makes xml/ada decide why a textnode is
sometimes split into several nodes. I see the pattern now, of course,
'split on predefined entity' but why?

for Index in 1 .. Length (List) loop
  N := Item(List, Index - 1);
  cl:=child_nodes(N);
  Ada.Text_IO.Put_Line("Value of Row is: " &  Node_Value(First_Child
(N)));
  for i in 1 .. length(cl) loop
    Ada.Text_IO.Put_Line("Value of node is: " &
Node_Value(item(cl,i-1)));
  end loop;
end loop;

can be replaced with

for Index in 1 .. Length (List) loop
  N := Item(List, Index - 1);
  Normalize(N);
  Ada.Text_IO.Put_Line("Value of Row is: " &  Node_Value(First_Child
(N)));
end loop;

with the same result.
Thank you Simon and George




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: xml/ada dropping data when pre-defined entities are separated by space?
  2011-01-31  8:04           ` björn lundin
@ 2011-01-31 14:56             ` Emmanuel Briot
  2011-01-31 20:17               ` björn lundin
  2011-02-01  9:49               ` björn lundin
  0 siblings, 2 replies; 12+ messages in thread
From: Emmanuel Briot @ 2011-01-31 14:56 UTC (permalink / raw)


> I was just going to ask what makes xml/ada decide why a textnode is
> sometimes split into several nodes. I see the pattern now, of course,
> 'split on predefined entity' but why?

Because XML/Ada tries to be as efficient as possible, and normalizing
the document takes time that a lot of application have no need for. If
indeed your application is only able to deal with normalized document
(it really shouldn't, the XML standard is quite clear that a document
is not necessarily normalized), then indeed you should call Normalize.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: xml/ada dropping data when pre-defined entities are separated by space?
  2011-01-31 14:56             ` Emmanuel Briot
@ 2011-01-31 20:17               ` björn lundin
  2011-02-01  9:49               ` björn lundin
  1 sibling, 0 replies; 12+ messages in thread
From: björn lundin @ 2011-01-31 20:17 UTC (permalink / raw)


On 31 Jan, 15:56, Emmanuel Briot <briot.emman...@gmail.com> wrote:
> > I was just going to ask what makes xml/ada decide why a textnode is
> > sometimes split into several nodes. I see the pattern now, of course,
> > 'split on predefined entity' but why?
>
> Because XML/Ada tries to be as efficient as possible, and normalizing
> the document takes time that a lot of application have no need for. If
> indeed your application is only able to deal with normalized document
> (it really shouldn't, the XML standard is quite clear that a document
> is not necessarily normalized), then indeed you should call Normalize.

ok. Thanks for the info. This part of xml i did not know; but i see
the need for splitting
Into chunks, but i thought that was most of concern for big
textnodes.
Never take anything for granted iss the lesson learned.
/björn



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: xml/ada dropping data when pre-defined entities are separated by space?
  2011-01-31 14:56             ` Emmanuel Briot
  2011-01-31 20:17               ` björn lundin
@ 2011-02-01  9:49               ` björn lundin
  2011-02-01 18:24                 ` Vadim Godunko
  1 sibling, 1 reply; 12+ messages in thread
From: björn lundin @ 2011-02-01  9:49 UTC (permalink / raw)


On 31 Jan, 15:56, Emmanuel Briot <briot.emman...@gmail.com> wrote:
> > I was just going to ask what makes xml/ada decide why a textnode is
> > sometimes split into several nodes. I see the pattern now, of course,
> > 'split on predefined entity' but why?
>
> Because XML/Ada tries to be as efficient as possible, and normalizing
> the document takes time that a lot of application have no need for. If
> indeed your application is only able to deal with normalized document
> (it really shouldn't, the XML standard is quite clear that a document
> is not necessarily normalized), then indeed you should call Normalize.

Ok. A colleauge of mine pointed me to
http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/core.html#ID-1312295772
which states :
"When a document is first made available via the DOM, there is only
one Text node for each block of text. Users may create adjacent Text
nodes that represent the contents of a given element without any
intervening markup, but should be aware that there is no way to
represent the separations between these nodes in XML or HTML, so they
will not (in general) persist between DOM editing sessions. The
normalize() method on Node merges any such adjacent Text objects into
a single node for each block of text."

Isn't this the case of the first sentence?
I parse the document, I do not edit it in any way,
I traverse it, and there are several childnodes instead of one.

Or how should 'When a document is first made available via the DOM'
be interpreted?

/Björn



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: xml/ada dropping data when pre-defined entities are separated by space?
  2011-02-01  9:49               ` björn lundin
@ 2011-02-01 18:24                 ` Vadim Godunko
  0 siblings, 0 replies; 12+ messages in thread
From: Vadim Godunko @ 2011-02-01 18:24 UTC (permalink / raw)


On Feb 1, 12:49 pm, björn lundin <b.f.lun...@gmail.com> wrote:
>
> Isn't this the case of the first sentence?
Not exactly. Your document contains entity references, but references
to predefined entities are replaced by corresponding characters, see

http://www.w3.org/TR/2000/REC-DOM-Level-2-Core-20001113/core.html#ID-11C98490

"... Note that character references and references to predefined
entities are considered to be expanded by the HTML or XML processor so
that characters are represented by their Unicode equivalent rather
than by an entity reference. ..."

So, now your are right, it is a bug in DOM tree construction.



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-02-01 18:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-27 10:06 xml/ada dropping data when pre-defined entities are separated by space? björn lundin
2011-01-27 11:12 ` Georg Bauhaus
2011-01-29 15:36   ` björn lundin
2011-01-29 17:13     ` Simon Wright
2011-01-29 21:49       ` björn lundin
2011-01-29 22:24         ` Simon Wright
2011-01-31  0:22         ` Georg Bauhaus
2011-01-31  8:04           ` björn lundin
2011-01-31 14:56             ` Emmanuel Briot
2011-01-31 20:17               ` björn lundin
2011-02-01  9:49               ` björn lundin
2011-02-01 18:24                 ` Vadim Godunko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox