Re: OpenToken: Handling the empty word token

comp.lang.ada
 help / color / mirror / Atom feed

From: Stephen Leake <stephen_leake@stephe-leake.org>
Subject: Re: OpenToken: Handling the empty word token
Date: Sat, 28 Jan 2012 05:46:45 -0500
Date: 2012-01-28T05:46:45-05:00	[thread overview]
Message-ID: <82k44cayui.fsf@stephe-leake.org> (raw)
In-Reply-To: 62121d9d-f208-4e78-a109-749742da14a6@h12g2000yqg.googlegroups.com

mtrenkmann <martin.trenkmann@googlemail.com> writes:

> Very often grammars have so called epsilon-productions where one
> alternative for a non-terminal symbol points to the empty word
> (epsilon).

I've never seen this, but I haven't seen many grammars.

> For example: Optional -> Something | epsilon
>
> In OpenToken I modeled the epsilon token as an
> OpenToken.Recognizer.Nothing.Instance and defined the production like
> this:
>
> Optional <= Something and
> Optional <= epsilon

I've always coded this at a higher level. I'm guessing you want to have
something like this:

Settings <= required_settings & optional + action

I would code that as:

Settings <= required_settings + action and
Settings <= required_settings & optional + action

Which I assume you are doing as a workaround for what you want.

Supporting epsilon is an interesting idea. It does reduce the amount of
user-written code, especially when there is more than one optional part.

> Now I realized that the lexer would actually never emit the epsilon
> token, because of it's pure formal meaning, and thus the second
> production would never be detected.

Right.

> Is there a way to instrument the parser to silently accept the epsilon
> token whenever it expects it without consuming a token from the lexer,

It's been a while since I messed with OpenToken at that level, but it
looks like you could make this happen by changing
opentoken-recognizer-nothing.adb Analyze to report Verdict := Matches.
You might give that a try.

I suspect this would have the problem that Dmitry pointed out; the
epsilon token would always match, so you'd get lots of ambiguities.

To get rid of the ambiguities, you would need to feed information about
what the parser expects into the lexer, so Epsilon would only match if
the parser expects it. That is possible, but there's no formal mechanism
for it in OpenToken. 

You could define a new token type Epsilon, modeled on Nothing, and add a
Set_Expected procedure. Then call that from somewhere in
opentoken-production-parser-lalr.adb Parse; probably just before each
call to Find_Next. The list of expected tokens is given by the function
opentoken-production-parser-lalr.adb Expecting (currently used in error
messages).

If that works, it might be nice to define Set_Expected as a dispatching
operation on OpenToken.Recognizer.Instance, so other tokens could use
it. In that case, it might be worth it to cache the result of Expecting.
Or pass Action_List directly to Set_Expecting, instead of reformatting it.

> or is it a common convention to translate each grammar into a epsilon-
> free representation?

That's certainly how I do it.

-- 
-- Stephe

next prev parent reply	other threads:[~2012-01-28 10:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-27 16:22 OpenToken: Handling the empty word token mtrenkmann
2012-01-27 16:48 ` Dmitry A. Kazakov
2012-01-28  3:42   ` Randy Brukardt
2012-01-29 17:45     ` Stephen Leake
2012-01-31  0:56       ` Randy Brukardt
2012-01-31  9:09         ` Georg Bauhaus
2012-01-31 12:16         ` Stephen Leake
2012-02-02  1:39           ` Randy Brukardt
2012-01-28 10:46 ` Stephen Leake [this message]
2012-01-30 16:28   ` mtrenkmann
2012-01-30 18:34     ` Dmitry A. Kazakov
2012-01-31 12:58     ` Stephen Leake

replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox