comp.lang.ada
 help / color / mirror / Atom feed
* HTML parser in Ada ?
@ 2002-11-15 10:49 Gautier direct_replies_not_read
  2002-11-15 16:06 ` Preben Randhol
  0 siblings, 1 reply; 11+ messages in thread
From: Gautier direct_replies_not_read @ 2002-11-15 10:49 UTC (permalink / raw)


Is there somewhere Ada source(s) for an (ideally simple) HTML
parser ? - Before I reinvent the wheel...
________________________________________________________
Gautier  --  http://www.mysunrise.ch/users/gdm/gsoft.htm

NB: For a direct answer, address on the Web site!



_________________________________________________________________
MSN 8 with e-mail virus protection service: 2 months FREE* 
http://join.msn.com/?page=features/virus




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: HTML parser in Ada ?
@ 2002-11-15 11:08 Grein, Christoph
  2002-11-15 14:24 ` Victor Porton
  0 siblings, 1 reply; 11+ messages in thread
From: Grein, Christoph @ 2002-11-15 11:08 UTC (permalink / raw)


See Ted Dennison's OpenToken page. There is a (primitive) HTML lexer (no parser) 
that I wrote.
There are also lexers for Ada, Java (by me), C++, Modula 3 (not by me).

> Is there somewhere Ada source(s) for an (ideally simple) HTML
> parser ? - Before I reinvent the wheel...
> ________________________________________________________
> Gautier  --  http://www.mysunrise.ch/users/gdm/gsoft.htm
> 
> NB: For a direct answer, address on the Web site!
> 
> 
> 
> _________________________________________________________________
> MSN 8 with e-mail virus protection service: 2 months FREE* 
> http://join.msn.com/?page=features/virus
> 
> _______________________________________________
> comp.lang.ada mailing list
> comp.lang.ada@ada.eu.org
> http://ada.eu.org/mailman/listinfo/comp.lang.ada



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: HTML parser in Ada ?
  2002-11-15 11:08 HTML parser in Ada ? Grein, Christoph
@ 2002-11-15 14:24 ` Victor Porton
  0 siblings, 0 replies; 11+ messages in thread
From: Victor Porton @ 2002-11-15 14:24 UTC (permalink / raw)


In article <mailman.1037358902.360.comp.lang.ada@ada.eu.org>,
	"Grein, Christoph" <christoph.grein@eurocopter.com> writes:
> See Ted Dennison's OpenToken page. There is a (primitive) HTML lexer (no parser) 
> that I wrote.
> There are also lexers for Ada, Java (by me), C++, Modula 3 (not by me).

Seemingly it does not support Unicode.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: HTML parser in Ada ?
  2002-11-15 10:49 Gautier direct_replies_not_read
@ 2002-11-15 16:06 ` Preben Randhol
  2002-11-15 17:00   ` Adrian Knoth
                     ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Preben Randhol @ 2002-11-15 16:06 UTC (permalink / raw)


Gautier direct_replies_not_read wrote:
> Is there somewhere Ada source(s) for an (ideally simple) HTML
> parser ? - Before I reinvent the wheel...

If you are not making something that is aimed to read the web-pages on
the net, please make something that reads XHTML only or that it follows
the HTML DTD strictly and rejects all faulty pages. Trying to make
something that can read web-pages is very difficult and your
application gets very error-prone. Most web-pages out there are broken
and does not use propper HTML. So if you want to display the pages
correctly then you have to make a lot of exceptions to the HTML DTD.

Hmm, think is it time I validate my own web-pages again. :-)

It would be great with a (X)HTML engine that does not crash though.

> MSN 8 with e-mail virus protection service: 2 months FREE* 

After two months they will start sending you viruses so you will
pay I guess ;-)

> http://join.msn.com/?page=features/virus

Resistance is futile.

After learning that simple javascript on a page can get IE to delete all
your files, I pull out the network cable when I have to use Windows for
something.

-- 
Preben Randhol ------------------------ http://www.pvv.org/~randhol/ --
�There are three things you can do to a woman. You can love her, suffer
 for her, or turn her into literature.�  - Justine, by Lawrence Durrell



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: HTML parser in Ada ?
  2002-11-15 16:06 ` Preben Randhol
@ 2002-11-15 17:00   ` Adrian Knoth
  2002-11-16  4:11   ` Randy Brukardt
  2002-11-16 19:43   ` Gautier
  2 siblings, 0 replies; 11+ messages in thread
From: Adrian Knoth @ 2002-11-15 17:00 UTC (permalink / raw)


Preben Randhol <randhol+news@pvv.org> wrote:

> After learning that simple javascript on a page can get IE to delete all
> your files, I pull out the network cable when I have to use Windows for
> something.

Be aware of JaveScript burned on CDROMs. You should remove your CD-drive,
too :)


-- 
mail: adi@thur.de  	http://adi.thur.de	PGP: v2-key via keyserver

Heut' ass er seine letzte Semmel, tagtaeglich rauchte er nur Camel.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: HTML parser in Ada ?
  2002-11-15 16:06 ` Preben Randhol
  2002-11-15 17:00   ` Adrian Knoth
@ 2002-11-16  4:11   ` Randy Brukardt
  2002-11-16 19:43   ` Gautier
  2 siblings, 0 replies; 11+ messages in thread
From: Randy Brukardt @ 2002-11-16  4:11 UTC (permalink / raw)


Preben Randhol wrote in message ...
>After learning that simple javascript on a page can get IE to delete
all
>your files, I pull out the network cable when I have to use Windows for
>something.

Javascript is evil. I run with it off and only turn it on for trusted
sites (i.e. my bank) that don't otherwise work. With it off, I hardly
ever get popup ads and other worthless cruft. If a site doesn't provide
a why to work without it, its hardly worth my time. I recommend this
approach to everyone, because the only way to get rid of it is to
essentially boycott sites that don't work without it.

(This is true 10x in e-mail.)

              Randy Brukardt.






^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: HTML parser in Ada ?
  2002-11-15 16:06 ` Preben Randhol
  2002-11-15 17:00   ` Adrian Knoth
  2002-11-16  4:11   ` Randy Brukardt
@ 2002-11-16 19:43   ` Gautier
  2002-11-17 12:00     ` Preben Randhol
  2002-11-18 14:17     ` Georg Bauhaus
  2 siblings, 2 replies; 11+ messages in thread
From: Gautier @ 2002-11-16 19:43 UTC (permalink / raw)


Preben Randhol:

> If you are not making something that is aimed to read the web-pages on
> the net, please make something that reads XHTML only or that it follows
> the HTML DTD strictly and rejects all faulty pages. Trying to make
> something that can read web-pages is very difficult and your
> application gets very error-prone. Most web-pages out there are broken
> and does not use propper HTML. So if you want to display the pages
> correctly then you have to make a lot of exceptions to the HTML DTD.

[x] Yes, I'm aware of it. I would put a "HTML_DTD_Strict: Boolean;"
somewhere, since one aim is to filter HTML files
"from the Web": remove the evil Javascript, meta's, ...
A more ambitious task would be to transform junk HTML into
compliant one - but I won't do it (mmmh... unless...).

[...]
> > http://join.msn.com/?page=features/virus
> 
> Resistance is futile.

Didn't you know it ?

OK - I'll take a look at proposed solutions:
XML/Ada and OpenToken.
Thanks!
________________________________________________________
Gautier  --  http://www.mysunrise.ch/users/gdm/gsoft.htm

NB: Pour une r�ponse directe, adresse sur le site ouaibe!



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: HTML parser in Ada ?
  2002-11-16 19:43   ` Gautier
@ 2002-11-17 12:00     ` Preben Randhol
  2002-12-02 19:50       ` Nicolas Seriot
  2002-11-18 14:17     ` Georg Bauhaus
  1 sibling, 1 reply; 11+ messages in thread
From: Preben Randhol @ 2002-11-17 12:00 UTC (permalink / raw)


Gautier wrote:
> [x] Yes, I'm aware of it. I would put a "HTML_DTD_Strict: Boolean;"
> somewhere, since one aim is to filter HTML files
> "from the Web": remove the evil Javascript, meta's, ...
> A more ambitious task would be to transform junk HTML into
> compliant one - but I won't do it (mmmh... unless...).

If you want to do the latter, then make sure you read the whole html
file first and then start mending it. Don't try to parse and fix as you
read the file. The biggest problem is that people put tags inside other
tags where it is not allowed. If only HTML had been more strict to start
with or should I say the browsers. Still I don't know a browser who will
tell you this page is not valid HTML. There is the W3 pages to do the
validation, but I think that browsers should also do this so that when
people make web pages they will see it and fix it right away.

-- 
Preben Randhol ------------------------ http://www.pvv.org/~randhol/ --
�There are three things you can do to a woman. You can love her, suffer
 for her, or turn her into literature.�  - Justine, by Lawrence Durrell



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: HTML parser in Ada ?
@ 2002-11-18  6:38 Grein, Christoph
  0 siblings, 0 replies; 11+ messages in thread
From: Grein, Christoph @ 2002-11-18  6:38 UTC (permalink / raw)


From: porton@ex-code.com (Victor Porton)
> 
> In article <mailman.1037358902.360.comp.lang.ada@ada.eu.org>,
> 	"Grein, Christoph" <christoph.grein@eurocopter.com> writes:
> > See Ted Dennison's OpenToken page. There is a (primitive) HTML lexer (no 
parser) 
> > that I wrote.
> > There are also lexers for Ada, Java (by me), C++, Modula 3 (not by me).
> 
> Seemingly it does not support Unicode.

As I said, it's a primitive lexer. Fell free to add support for Unicode 
(Ada.Standard.Wide_String :-)



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: HTML parser in Ada ?
  2002-11-16 19:43   ` Gautier
  2002-11-17 12:00     ` Preben Randhol
@ 2002-11-18 14:17     ` Georg Bauhaus
  1 sibling, 0 replies; 11+ messages in thread
From: Georg Bauhaus @ 2002-11-18 14:17 UTC (permalink / raw)


Gautier <gautier_niouzes@hotmail.com> wrote:
: OK - I'll take a look at proposed solutions:
: XML/Ada and OpenToken.

Be aware though that you are on the edge of natural language
processing when dealing with real world web pages. The best
one can hope is that you don't need more than full SGML with
tag minimization features. Second best will likely require
context sensitive parsing and some heuristics.

-- georg



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: HTML parser in Ada ?
  2002-11-17 12:00     ` Preben Randhol
@ 2002-12-02 19:50       ` Nicolas Seriot
  0 siblings, 0 replies; 11+ messages in thread
From: Nicolas Seriot @ 2002-12-02 19:50 UTC (permalink / raw)


Preben Randhol <randhol+news@pvv.org> wrote:

> Still I don't know a browser who will tell you this page is not valid HTML.

iCab (for MacOS) does :

<http://www.icab.de/>
<http://www.seriot.ch/canardmac/screenshots/erreurs.gif>

-- 
Nicolas Seriot
www.seriot.ch



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-12-02 19:50 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-11-15 11:08 HTML parser in Ada ? Grein, Christoph
2002-11-15 14:24 ` Victor Porton
  -- strict thread matches above, loose matches on Subject: below --
2002-11-18  6:38 Grein, Christoph
2002-11-15 10:49 Gautier direct_replies_not_read
2002-11-15 16:06 ` Preben Randhol
2002-11-15 17:00   ` Adrian Knoth
2002-11-16  4:11   ` Randy Brukardt
2002-11-16 19:43   ` Gautier
2002-11-17 12:00     ` Preben Randhol
2002-12-02 19:50       ` Nicolas Seriot
2002-11-18 14:17     ` Georg Bauhaus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox