From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00
	autolearn=unavailable autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!mx02.eternal-september.org!feeder.eternal-september.org!news.glorb.com!peer02.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post02.iad.highwinds-media.com!news.flashnewsgroups.com-b7.4zTQh5tI3A!not-for-mail
From: Stephen Leake <stephen_leake@stephe-leake.org>
Newsgroups: comp.lang.ada
Subject: Re: OpenToken: Parsing Ada (subset)?
References: <878uc3r2y6.fsf@adaheads.sparre-andersen.dk>
 	<85twupvjxo.fsf@stephe-leake.org>
 	<81ceb070-16fe-4578-a09a-eb11a2bbb664@googlegroups.com>
 	<162zj7c2l0ykp$.1rxias18vby83.dlg@40tude.net>
 	<856172bk80.fsf@stephe-leake.org>
 	<1ljiyuuchbxvp.wrtbilkw3rdb.dlg@40tude.net>
 	<85pp4vakmy.fsf@stephe-leake.org>
 	<1a08qrccls0bi$.16y7q3hosklae.dlg@40tude.net>
Date: Wed, 17 Jun 2015 12:29:48 -0500
Message-ID: <85twu68cqb.fsf@stephe-leake.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (windows-nt)
Cancel-Lock: sha1:C/iXFikaV5EwERNwqnM5EYu6gFQ=
MIME-Version: 1.0
Content-Type: text/plain
X-Complaints-To: abuse@flashnewsgroups.com
Organization: FlashNewsgroups.com
X-Trace: 4180a5581ae8ee97f808409535
X-Received-Bytes: 3494
X-Received-Body-CRC: 3575449010
Xref: news.eternal-september.org comp.lang.ada:26357
Date: 2015-06-17T12:29:48-05:00
List-Id: <comp.lang.ada>

"Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:

> On Tue, 16 Jun 2015 07:43:49 -0500, Stephen Leake wrote:
>
>> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>> 
>> Here's the regular expression I use for Ada numeric literals:
>> 
>> "\([0-9]+#\)?[-+0-9a-fA-F.]+\(#\)?"
>> 
>> Given that you are at least a little familiar with regular expressions,
>> there's nothing hard about that.
>
> It is hard.

Ok, I gather you are not "at least a little familiar with regular
expressions". Too bad.

>> It does not enforce all the lexical rules for numbers; it allows
>> repeated, leading, and trailing underscores; it doesn't enforce pairs of
>> '#'.
>
> That is exactly the point. It does not parse literal right 

It's _not_ a "parser"; it's a "lexer".

Define "right". The line between lexer and parser is a design decision,
not set in stone. 

> and you have to
> reparse the matched chunk of text once again. What was the gain? 

Doing it this way allows reusing a regexp engine, which is easier than
writing a lexer from scatch.

>> It took a few minutes to write the regular expression, and I reused the
>> tests; no bugs found yet.
>
>> Obviously, if you are not familiar with regular expressions, they will
>> be harder to write. But a software engineer should be willing to learn
>> the appropriate language for the job at hand.
>
> That is not the point. I am familiar with C, but I am avoiding writing
> anything in C. Regular expressions is a far worse language than C and,
> additionally, incapable to parse Ada literals. Why bother with that
> mess?

As I have explained several times, I believe this approach is easier
than writing a lexer by hand; why bother with _that_ mess?

It's your choice. It would be nice if you could admit that other people can
make other choices, and still write good code.

Perhaps you could post lexer code that does this "right" by your
definition, so we could judge for ourselves?

>> I would guess that the average good programmer, starting with no
>> knowledge of either, can learn enough about regular expressions to write
>> the above faster than they can learn enough about writing scanners in
>> Ada to do that job well.
>
> The problem is that writing a correct pattern for anything more complex
> than trivial is hard even for people doing this on daily basis. For an
> average programmer it is patently impossible.

"patently" is overkill here; this is simply not my experience.

-- 
-- Stephe