From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!mx02.eternal-september.org!feeder.eternal-september.org!aioe.org!.POSTED!not-for-mail From: "Dmitry A. Kazakov" Newsgroups: comp.lang.ada Subject: Re: OpenToken: Parsing Ada (subset)? Date: Wed, 3 Jun 2015 09:36:14 +0200 Organization: cbb software GmbH Message-ID: <162zj7c2l0ykp$.1rxias18vby83.dlg@40tude.net> References: <878uc3r2y6.fsf@adaheads.sparre-andersen.dk> <85twupvjxo.fsf@stephe-leake.org> <81ceb070-16fe-4578-a09a-eb11a2bbb664@googlegroups.com> Reply-To: mailbox@dmitry-kazakov.de NNTP-Posting-Host: enOx0b+nfqkc2k+TNpOejg.user.speranza.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Complaints-To: abuse@aioe.org User-Agent: 40tude_Dialog/2.0.15.1 X-Notice: Filtered by postfilter v. 0.8.2 Xref: news.eternal-september.org comp.lang.ada:26151 Date: 2015-06-03T09:36:14+02:00 List-Id: On Tue, 2 Jun 2015 18:43:50 -0700 (PDT), Shark8 wrote: > On Tuesday, June 2, 2015 at 4:12:37 PM UTC-6, Stephen Leake wrote: >> >> Obvious to me, but I've been messing with the lexer code in FastToken >> recently; I switched to using regular expressions for the token >> recognizers (to be closer to Aflex, which is also now supported). While >> doing that, I deleted the default Get above. >> >> Using regular expressions instead of the old OpenToken recognizers is a >> big change, but wisi-generate takes care of most of the drudge work for >> you (you just have to specify the actual regular expressions). > > Is that a good idea? No. > From my experience [mainly maintenance] RegEx is almost always a bad > solution (though I will grant that most of my encounters w/ it involved > applying it as a formatting/parsing tool for items that generally weren't > amiable to such breakdowns [street addresses, for example, are good at > containing info/formatting that kills a simple regex]). Yes. Maintenance is one problem, another is that the family of languages recognized by regular expressions is far too weak. More powerful languages, e.g. SNOBOL patterns are slower. In the end it is always worth of efforts writing a manual token scanner by hand. Firstly, there are not so many things you would have to recognize that way. Secondly it is much more efficient than pattern matching. Thirdly it would allow sane error messaging, because usually it is more outcomes than matched vs. not matched, e.g. malformed identifier or missing quotation mark. -- Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de