From: gautier_niouzes@hotmail.com (Gautier)
Subject: Re: HTML parser in Ada ?
Date: 16 Nov 2002 11:43:05 -0800
Date: 2002-11-16T19:43:06+00:00 [thread overview]
Message-ID: <17cd177c.0211161143.7f8d5842@posting.google.com> (raw)
In-Reply-To: slrnata6rh.161.randhol+news@kiuk0152.chembio.ntnu.no
Preben Randhol:
> If you are not making something that is aimed to read the web-pages on
> the net, please make something that reads XHTML only or that it follows
> the HTML DTD strictly and rejects all faulty pages. Trying to make
> something that can read web-pages is very difficult and your
> application gets very error-prone. Most web-pages out there are broken
> and does not use propper HTML. So if you want to display the pages
> correctly then you have to make a lot of exceptions to the HTML DTD.
[x] Yes, I'm aware of it. I would put a "HTML_DTD_Strict: Boolean;"
somewhere, since one aim is to filter HTML files
"from the Web": remove the evil Javascript, meta's, ...
A more ambitious task would be to transform junk HTML into
compliant one - but I won't do it (mmmh... unless...).
[...]
> > http://join.msn.com/?page=features/virus
>
> Resistance is futile.
Didn't you know it ?
OK - I'll take a look at proposed solutions:
XML/Ada and OpenToken.
Thanks!
________________________________________________________
Gautier -- http://www.mysunrise.ch/users/gdm/gsoft.htm
NB: Pour une r�ponse directe, adresse sur le site ouaibe!
next prev parent reply other threads:[~2002-11-16 19:43 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-11-15 10:49 HTML parser in Ada ? Gautier direct_replies_not_read
2002-11-15 16:06 ` Preben Randhol
2002-11-15 17:00 ` Adrian Knoth
2002-11-16 4:11 ` Randy Brukardt
2002-11-16 19:43 ` Gautier [this message]
2002-11-17 12:00 ` Preben Randhol
2002-12-02 19:50 ` Nicolas Seriot
2002-11-18 14:17 ` Georg Bauhaus
-- strict thread matches above, loose matches on Subject: below --
2002-11-15 11:08 Grein, Christoph
2002-11-15 14:24 ` Victor Porton
2002-11-18 6:38 Grein, Christoph
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox