comp.lang.ada
 help / color / mirror / Atom feed
From: Preben Randhol <randhol+news@pvv.org>
Subject: Re: HTML parser in Ada ?
Date: Sun, 17 Nov 2002 12:00:54 +0000 (UTC)
Date: 2002-11-17T12:00:54+00:00	[thread overview]
Message-ID: <slrnatf173.e9.randhol+news@kiuk0152.chembio.ntnu.no> (raw)
In-Reply-To: 17cd177c.0211161143.7f8d5842@posting.google.com

Gautier wrote:
> [x] Yes, I'm aware of it. I would put a "HTML_DTD_Strict: Boolean;"
> somewhere, since one aim is to filter HTML files
> "from the Web": remove the evil Javascript, meta's, ...
> A more ambitious task would be to transform junk HTML into
> compliant one - but I won't do it (mmmh... unless...).

If you want to do the latter, then make sure you read the whole html
file first and then start mending it. Don't try to parse and fix as you
read the file. The biggest problem is that people put tags inside other
tags where it is not allowed. If only HTML had been more strict to start
with or should I say the browsers. Still I don't know a browser who will
tell you this page is not valid HTML. There is the W3 pages to do the
validation, but I think that browsers should also do this so that when
people make web pages they will see it and fix it right away.

-- 
Preben Randhol ------------------------ http://www.pvv.org/~randhol/ --
�There are three things you can do to a woman. You can love her, suffer
 for her, or turn her into literature.�  - Justine, by Lawrence Durrell



  reply	other threads:[~2002-11-17 12:00 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-11-15 10:49 HTML parser in Ada ? Gautier direct_replies_not_read
2002-11-15 16:06 ` Preben Randhol
2002-11-15 17:00   ` Adrian Knoth
2002-11-16  4:11   ` Randy Brukardt
2002-11-16 19:43   ` Gautier
2002-11-17 12:00     ` Preben Randhol [this message]
2002-12-02 19:50       ` Nicolas Seriot
2002-11-18 14:17     ` Georg Bauhaus
  -- strict thread matches above, loose matches on Subject: below --
2002-11-15 11:08 Grein, Christoph
2002-11-15 14:24 ` Victor Porton
2002-11-18  6:38 Grein, Christoph
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox