From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00
	autolearn=unavailable autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!nntp-feed.chiark.greenend.org.uk!ewrotcd!newsfeed.xs3.de!io.xs3.de!news.jacob-sparre.dk!franka.jacob-sparre.dk!pnx.dk!.POSTED.rrsoftware.com!not-for-mail
From: "Randy Brukardt" <randy@rrsoftware.com>
Newsgroups: comp.lang.ada
Subject: Re: Encaspulation: What to export
Date: Thu, 30 Nov 2017 15:42:23 -0600
Organization: JSA Research & Innovation
Message-ID: <ovptvv$8uf$1@franka.jacob-sparre.dk>
References: <ovhoqv$108h$1@gioia.aioe.org> <ovhqt6$159h$1@gioia.aioe.org>
 <ovhst9$1a43$1@gioia.aioe.org> <ovigma$fi6$1@franka.jacob-sparre.dk>
 <ovjv01$1r5c$1@gioia.aioe.org> <ovkod5$7s6$1@franka.jacob-sparre.dk>
 <8666203a-4e42-438d-8fe0-1a63f643955f@googlegroups.com>
 <b9353594-3831-4fcb-b27c-b0afca754a60@googlegroups.com>
 <ovn5g4$qqa$1@franka.jacob-sparre.dk>
 <1aab7965-08cf-472f-9322-bfabb6f2c728@googlegroups.com>
Injection-Date: Thu, 30 Nov 2017 21:42:23 -0000 (UTC)
Injection-Info: franka.jacob-sparre.dk;
 posting-host="rrsoftware.com:24.196.82.226";
	logging-data="9167"; mail-complaints-to="news@jacob-sparre.dk"
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2900.5931
X-RFC2646: Format=Flowed; Original
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.7246
Xref: reader02.eternal-september.org comp.lang.ada:49280
Date: 2017-11-30T15:42:23-06:00
List-Id: <comp.lang.ada>

<gautier_niouzes@hotmail.com> wrote in message 
news:1aab7965-08cf-472f-9322-bfabb6f2c728@googlegroups.com...
> Randy Brukardt:
>
>> Really? We don't have any parser (just a lexer) in the search engine
>> crawler. As I recall, section closes are counted rather than anything 
>> more
>> complex.
>
> Sure. You could say the same for gathering identifiers from Ada sources 
> for a search engine for Ada sources. A lexer is ok for that job. Would you 
> conclude that Ada doesn't need to be parsed ?
> Or reversely, how would you manage to display HTML lists or tables without 
> a parser ? With a search engine crawler you just throw away the HTML 
> structures. This is okay for your crawler: you just need the text between 
> the tags.

Not really true, since we tag the URLs with the type of reference 
(automatic, like images, or manual, like links) and that requires 
identifying the enclosing construct as well as the "attribute" containing 
the URL. And there are a few cases were the meaning of the attribute is 
different in different constructs.

As far as lists or tables goes, you can build an HTML tree without any 
parsing, so I don't see any requirement to parse HTML. (It might be easier 
to build a parser using a tool than a hand-constructed tree builder if ones 
needs are complex enough, but that doesn't change the underlying issue.)

                         Randy.