From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,13b4e394fcd91d4
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,UTF8
Received: by 10.68.73.229 with SMTP id o5mr13739162pbv.7.1328014735231;
        Tue, 31 Jan 2012 04:58:55 -0800 (PST)
Path: 
 lh20ni244839pbb.0!nntp.google.com!news2.google.com!border1.nntp.dca.giganews.com!nntp.giganews.com!npeer01.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post02.iad.highwinds-media.com!news.flashnewsgroups.com-b7.4zTQh5tI3A!not-for-mail
From: Stephen Leake <stephen_leake@stephe-leake.org>
Newsgroups: comp.lang.ada
Subject: Re: OpenToken: Handling the empty word token
References: 
 <62121d9d-f208-4e78-a109-749742da14a6@h12g2000yqg.googlegroups.com>
 	<82k44cayui.fsf@stephe-leake.org>
 	<7756ad67-4ce1-4fb8-a4a1-f8136815ac9b@p13g2000yqd.googlegroups.com>
Date: Tue, 31 Jan 2012 07:58:46 -0500
Message-ID: <82k448gha1.fsf@stephe-leake.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (windows-nt)
Cancel-Lock: sha1:T527gUPBYIO9/XR7hfJLNlKtjJY=
MIME-Version: 1.0
X-Complaints-To: abuse@flashnewsgroups.com
Organization: FlashNewsgroups.com
X-Trace: 100ee4f27e58fe029e66109630
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Date: 2012-01-31T07:58:46-05:00
List-Id: <comp.lang.ada>

mtrenkmann <martin.trenkmann@googlemail.com> writes:

> On Jan 28, 11:46 am, Stephen Leake <stephen_le...@stephe-leake.org>
> wrote:
>> mtrenkmann <martin.trenkm...@googlemail.com> writes:
>> > Very often grammars have so called epsilon-productions where one
>> > alternative for a non-terminal symbol points to theemptyword
>> > (epsilon).
>>
>> I've never seen this, but I haven't seen many grammars.
>>
> Here is the ASN.1 grammar I have to deal with (see Annex L):
> http://www.itu.int/rec/T-REC-X.680-200811-I/en

When OpenToken was first designed, the intent was to process small
languages for system configuration. And that is how I'm using it at
work. So I'm not surprised that it has a hard time with large languages.

> Ok, thanks for all your helpful hints. In the meanwhile I came up with
> the following workaround, that actually works fine.

<snip>

That looks like a reasonable workaround and proof of principle, but not
a long-term solution.

> I am wondering if that behavior cannot be implemented in the table-
> lookup procedure. That is when I call Action_For with an unexpected
> token, while there is the epsilon as one of the expected tokens, then
> this epsilon could be accepted automatically, the parser changes its
> state accordingly, and the unexpected token is checked again. Don't
> know if that is possible, since I am not yet familiar with the
> internals of table-driven parsing.

It would certainly be more efficient than your current code.

I suspect making epsilon an explicit token in the parser but not the
lexer is the right way to go.

I suggest you get "the red dragon book"
(en.wikipedia.org/wiki/Principles_of_Compiler_Design [1]), and read
chapter 5 (you probably have to read the earlier chapters also to
understand it).

Skimming thru chapter 5, I don't see any discussion on how to handle
epsilon, but it must be in there somewhere.

Example 5.36 has an example grammar with an epsilon. Implement that in
an OpenToken test (or pick a small subset of your real language), and
then see if you can figure out how to integrate it into the state
machine. 

I find small tests immensely helpful in understanding the code. See
test_lr0_kernels.adb for helpful techniques; setting Debug => True turns
on lots of useful output.


Another thing to be aware of; the lexer in OpenToken is horribly
inefficient. So if you are planning to parse large amounts of text, you
should consider replacing (or improving) the lexer. It would be
interesting to see how hard that is in OpenToken, or you may want to
switch to another generator (although I am not aware of any others
written in and generating Ada).

[1] The purple dragon book may be better; it's a lot more recent, but
the blurb indicates it's less focussed on actually implementing parsers.
And some of the reviews on Amazon imply there are better books with
better techniques out there. So you may want to look for a better book;
there are certainly plenty of compiler books on Amazon.

-- 
-- Stephe