comp.lang.ada
 help / color / mirror / Atom feed
From: "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de>
Subject: Re: OpenToken: Parsing Ada (subset)?
Date: Wed, 17 Jun 2015 21:03:59 +0200
Date: 2015-06-17T21:03:59+02:00	[thread overview]
Message-ID: <1ucuzb8jv2ibe.4awaxtp8eab6.dlg@40tude.net> (raw)
In-Reply-To: 85twu68cqb.fsf@stephe-leake.org

On Wed, 17 Jun 2015 12:29:48 -0500, Stephen Leake wrote:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
> 
>> On Tue, 16 Jun 2015 07:43:49 -0500, Stephen Leake wrote:
>>
>>> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> writes:
>>> 
>>> Here's the regular expression I use for Ada numeric literals:
>>> 
>>> "\([0-9]+#\)?[-+0-9a-fA-F.]+\(#\)?"
>>> 
>>> Given that you are at least a little familiar with regular expressions,
>>> there's nothing hard about that.
>>
>> It is hard.
> 
> Ok, I gather you are not "at least a little familiar with regular
> expressions".

*Nobody* is familiar with to be sure that the language generated by the
pattern like above is one of the Ada numeric literal. Note "like", because
your pattern obviously does not generate the Ada numeric literal.

The things are actually much worse that complexity. It is a combination of
complexity and weakness. Regular expressions cannot do stuff like Ada
literals. Thus patterns actually used are only approximations to what is
required. The designer must know how the generated language differ from the
required one. And the reader must read not only the program but also the
mind of pattern designer. 

>>> It does not enforce all the lexical rules for numbers; it allows
>>> repeated, leading, and trailing underscores; it doesn't enforce pairs of
>>> '#'.
>>
>> That is exactly the point. It does not parse literal right 
> 
> It's _not_ a "parser"; it's a "lexer".
> 
> Define "right".

def Right:

No false positives, no false negatives <=> Rejects only illegal literals,
accepts only legal literals.

>The line between lexer and parser is a design decision,
> not set in stone. 

True, but we are not talking about higher-level things like maximum
fraction length supported. Simple lexical stuff like:

- matching '#'s
- non-repeating '_'s
- valid base number
- the set of digits corresponding to the base
etc

all are beyond the power of regular expressions. (Unlike SNOBOL patterns)

>> and you have to
>> reparse the matched chunk of text once again. What was the gain? 
> 
> Doing it this way allows reusing a regexp engine, which is easier than
> writing a lexer from scatch.

You still have to parse it again. Also with or without regular expression
you have to do it. The only difference is in detecting the end of the
lexeme. Not a problem for manually written scanner at all.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

  parent reply	other threads:[~2015-06-17 19:03 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-01 13:08 OpenToken: Parsing Ada (subset)? Jacob Sparre Andersen
2015-06-02 22:12 ` Stephen Leake
2015-06-03  1:43   ` Shark8
2015-06-03  7:36     ` Dmitry A. Kazakov
2015-06-05  9:03       ` Stephen Leake
2015-06-05  9:23         ` Georg Bauhaus
2015-06-05 20:49           ` Shark8
2015-06-05 23:52             ` Dennis Lee Bieber
2015-06-05 12:20         ` Dmitry A. Kazakov
2015-06-16 12:43           ` Stephen Leake
2015-06-16 13:24             ` Dmitry A. Kazakov
2015-06-16 14:13               ` G.B.
2015-06-17 17:38                 ` Stephen Leake
2015-06-17 17:29               ` Stephen Leake
2015-06-17 17:42                 ` Shark8
2015-06-17 19:03                 ` Dmitry A. Kazakov [this message]
2015-06-05 20:53         ` Shark8
2015-06-16 14:46           ` Stephen Leake
2015-06-16 15:31             ` G.B.
2015-06-17 17:44               ` Stephen Leake
2015-06-16 21:34             ` Randy Brukardt
2015-06-17 17:58               ` Stephen Leake
2015-06-17 20:44                 ` Randy Brukardt
2015-06-18  7:51                 ` AdaMagica
2015-06-18  9:12                 ` Georg Bauhaus
2015-06-17 17:50 ` AdaMagica
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox