From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-0.3 required=5.0 tests=BAYES_00,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!mx02.eternal-september.org!feeder.eternal-september.org!aioe.org!.POSTED!not-for-mail
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Newsgroups: comp.lang.ada
Subject: Re: OpenToken: Parsing Ada (subset)?
Date: Wed, 3 Jun 2015 09:36:14 +0200
Organization: cbb software GmbH
Message-ID: <162zj7c2l0ykp$.1rxias18vby83.dlg@40tude.net>
References: <878uc3r2y6.fsf@adaheads.sparre-andersen.dk>
 <85twupvjxo.fsf@stephe-leake.org>
 <81ceb070-16fe-4578-a09a-eb11a2bbb664@googlegroups.com>
Reply-To: mailbox@dmitry-kazakov.de
NNTP-Posting-Host: enOx0b+nfqkc2k+TNpOejg.user.speranza.aioe.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Complaints-To: abuse@aioe.org
User-Agent: 40tude_Dialog/2.0.15.1
X-Notice: Filtered by postfilter v. 0.8.2
Xref: news.eternal-september.org comp.lang.ada:26151
Date: 2015-06-03T09:36:14+02:00
List-Id: <comp.lang.ada>

On Tue, 2 Jun 2015 18:43:50 -0700 (PDT), Shark8 wrote:

> On Tuesday, June 2, 2015 at 4:12:37 PM UTC-6, Stephen Leake wrote:
>> 
>> Obvious to me, but I've been messing with the lexer code in FastToken
>> recently; I switched to using regular expressions for the token
>> recognizers (to be closer to Aflex, which is also now supported). While
>> doing that, I deleted the default Get above.
>> 
>> Using regular expressions instead of the old OpenToken recognizers is a
>> big change, but wisi-generate takes care of most of the drudge work for
>> you (you just have to specify the actual regular expressions).
> 
> Is that a good idea?

No.

> From my experience [mainly maintenance] RegEx is almost always a bad
> solution (though I will grant that most of my encounters w/ it involved
> applying it as a formatting/parsing tool for items that generally weren't
> amiable to such breakdowns [street addresses, for example, are good at
> containing info/formatting that kills a simple regex]).

Yes. Maintenance is one problem, another is that the family of languages
recognized by regular expressions is far too weak. More powerful languages,
e.g. SNOBOL patterns are slower.

In the end it is always worth of efforts writing a manual token scanner by
hand. Firstly, there are not so many things you would have to recognize
that way. Secondly it is much more efficient than pattern matching. Thirdly
it would allow sane error messaging, because usually it is more outcomes
than matched vs. not matched, e.g. malformed identifier or missing
quotation mark.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de