From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,13b4e394fcd91d4
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,ASCII-7-bit
Received: by 10.68.73.229 with SMTP id o5mr5931615pbv.7.1327747627664;
        Sat, 28 Jan 2012 02:47:07 -0800 (PST)
Path: 
 lh20ni233288pbb.0!nntp.google.com!news2.google.com!npeer03.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post02.iad.highwinds-media.com!news.flashnewsgroups.com-b7.4zTQh5tI3A!not-for-mail
From: Stephen Leake <stephen_leake@stephe-leake.org>
Newsgroups: comp.lang.ada
Subject: Re: OpenToken: Handling the empty word token
References: 
 <62121d9d-f208-4e78-a109-749742da14a6@h12g2000yqg.googlegroups.com>
Date: Sat, 28 Jan 2012 05:46:45 -0500
Message-ID: <82k44cayui.fsf@stephe-leake.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (windows-nt)
Cancel-Lock: sha1:kQDAOhvtkF9FRNdq8y6IWjzGkl4=
MIME-Version: 1.0
X-Complaints-To: abuse@flashnewsgroups.com
Organization: FlashNewsgroups.com
X-Trace: 77a5a4f23d22be029e66109192
Content-Type: text/plain; charset=us-ascii
Date: 2012-01-28T05:46:45-05:00
List-Id: <comp.lang.ada>

mtrenkmann <martin.trenkmann@googlemail.com> writes:

> Very often grammars have so called epsilon-productions where one
> alternative for a non-terminal symbol points to the empty word
> (epsilon).

I've never seen this, but I haven't seen many grammars.

> For example: Optional -> Something | epsilon
>
> In OpenToken I modeled the epsilon token as an
> OpenToken.Recognizer.Nothing.Instance and defined the production like
> this:
>
> Optional <= Something and
> Optional <= epsilon

I've always coded this at a higher level. I'm guessing you want to have
something like this:

Settings <= required_settings & optional + action

I would code that as:

Settings <= required_settings + action and
Settings <= required_settings & optional + action

Which I assume you are doing as a workaround for what you want.

Supporting epsilon is an interesting idea. It does reduce the amount of
user-written code, especially when there is more than one optional part.

> Now I realized that the lexer would actually never emit the epsilon
> token, because of it's pure formal meaning, and thus the second
> production would never be detected.

Right.

> Is there a way to instrument the parser to silently accept the epsilon
> token whenever it expects it without consuming a token from the lexer,

It's been a while since I messed with OpenToken at that level, but it
looks like you could make this happen by changing
opentoken-recognizer-nothing.adb Analyze to report Verdict := Matches.
You might give that a try.

I suspect this would have the problem that Dmitry pointed out; the
epsilon token would always match, so you'd get lots of ambiguities.

To get rid of the ambiguities, you would need to feed information about
what the parser expects into the lexer, so Epsilon would only match if
the parser expects it. That is possible, but there's no formal mechanism
for it in OpenToken. 

You could define a new token type Epsilon, modeled on Nothing, and add a
Set_Expected procedure. Then call that from somewhere in
opentoken-production-parser-lalr.adb Parse; probably just before each
call to Find_Next. The list of expected tokens is given by the function
opentoken-production-parser-lalr.adb Expecting (currently used in error
messages).

If that works, it might be nice to define Set_Expected as a dispatching
operation on OpenToken.Recognizer.Instance, so other tokens could use
it. In that case, it might be worth it to cache the result of Expecting.
Or pass Action_List directly to Set_Expecting, instead of reformatting it.

> or is it a common convention to translate each grammar into a epsilon-
> free representation?

That's certainly how I do it.

-- 
-- Stephe