From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=unavailable autolearn_force=no version=3.4.4 X-Received: by 10.36.65.10 with SMTP id x10mr106845ita.35.1511989683808; Wed, 29 Nov 2017 13:08:03 -0800 (PST) X-Received: by 10.157.87.203 with SMTP id q11mr93822oti.8.1511989683728; Wed, 29 Nov 2017 13:08:03 -0800 (PST) Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!feeder.eternal-september.org!news.kjsl.com!usenet.stanford.edu!193no993460itr.0!news-out.google.com!x87ni1128ita.0!nntp.google.com!i6no997424itb.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail Newsgroups: comp.lang.ada Date: Wed, 29 Nov 2017 13:08:03 -0800 (PST) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=2a02:1206:45c6:8b20:393e:a122:f6c7:30f0; posting-account=gRqrnQkAAAAC_02ynnhqGk1VRQlve6ZG NNTP-Posting-Host: 2a02:1206:45c6:8b20:393e:a122:f6c7:30f0 References: <8666203a-4e42-438d-8fe0-1a63f643955f@googlegroups.com> User-Agent: G2/1.0 MIME-Version: 1.0 Message-ID: <1aab7965-08cf-472f-9322-bfabb6f2c728@googlegroups.com> Subject: Re: Encaspulation: What to export From: gautier_niouzes@hotmail.com Injection-Date: Wed, 29 Nov 2017 21:08:03 +0000 Content-Type: text/plain; charset="UTF-8" Xref: reader02.eternal-september.org comp.lang.ada:49267 Date: 2017-11-29T13:08:03-08:00 List-Id: Randy Brukardt: > Really? We don't have any parser (just a lexer) in the search engine > crawler. As I recall, section closes are counted rather than anything more > complex. Sure. You could say the same for gathering identifiers from Ada sources for a search engine for Ada sources. A lexer is ok for that job. Would you conclude that Ada doesn't need to be parsed ? Or reversely, how would you manage to display HTML lists or tables without a parser ? With a search engine crawler you just throw away the HTML structures. This is okay for your crawler: you just need the text between the tags.