From: "Randy Brukardt" <randy@rrsoftware.com>
Subject: Re: Encaspulation: What to export
Date: Thu, 30 Nov 2017 15:42:23 -0600
Date: 2017-11-30T15:42:23-06:00 [thread overview]
Message-ID: <ovptvv$8uf$1@franka.jacob-sparre.dk> (raw)
In-Reply-To: 1aab7965-08cf-472f-9322-bfabb6f2c728@googlegroups.com
<gautier_niouzes@hotmail.com> wrote in message
news:1aab7965-08cf-472f-9322-bfabb6f2c728@googlegroups.com...
> Randy Brukardt:
>
>> Really? We don't have any parser (just a lexer) in the search engine
>> crawler. As I recall, section closes are counted rather than anything
>> more
>> complex.
>
> Sure. You could say the same for gathering identifiers from Ada sources
> for a search engine for Ada sources. A lexer is ok for that job. Would you
> conclude that Ada doesn't need to be parsed ?
> Or reversely, how would you manage to display HTML lists or tables without
> a parser ? With a search engine crawler you just throw away the HTML
> structures. This is okay for your crawler: you just need the text between
> the tags.
Not really true, since we tag the URLs with the type of reference
(automatic, like images, or manual, like links) and that requires
identifying the enclosing construct as well as the "attribute" containing
the URL. And there are a few cases were the meaning of the attribute is
different in different constructs.
As far as lists or tables goes, you can build an HTML tree without any
parsing, so I don't see any requirement to parse HTML. (It might be easier
to build a parser using a tool than a hand-constructed tree builder if ones
needs are complex enough, but that doesn't change the underlying issue.)
Randy.
next prev parent reply other threads:[~2017-11-30 21:42 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-27 19:25 Encaspulation: What to export Victor Porton
2017-11-27 20:00 ` Dmitry A. Kazakov
2017-11-27 20:34 ` Victor Porton
2017-11-27 20:35 ` Victor Porton
2017-11-28 2:12 ` Randy Brukardt
2017-11-28 15:22 ` Victor Porton
2017-11-28 22:36 ` Randy Brukardt
2017-11-28 23:29 ` Shark8
2017-11-29 4:44 ` gautier_niouzes
2017-11-29 20:32 ` Randy Brukardt
2017-11-29 21:08 ` gautier_niouzes
2017-11-30 21:42 ` Randy Brukardt [this message]
2017-11-29 8:40 ` Dmitry A. Kazakov
2017-11-29 7:31 ` G. B.
2017-11-29 7:38 ` G. B.
2017-11-29 8:14 ` Simon Wright
2017-11-29 18:46 ` Victor Porton
2017-11-29 19:43 ` Dmitry A. Kazakov
2017-11-29 19:57 ` Victor Porton
2017-11-29 20:42 ` Randy Brukardt
2017-11-29 22:18 ` Shark8
2017-11-27 21:39 ` Simon Wright
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox