From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,13b4e394fcd91d4 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Received: by 10.68.73.229 with SMTP id o5mr5931615pbv.7.1327747627664; Sat, 28 Jan 2012 02:47:07 -0800 (PST) Path: lh20ni233288pbb.0!nntp.google.com!news2.google.com!npeer03.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post02.iad.highwinds-media.com!news.flashnewsgroups.com-b7.4zTQh5tI3A!not-for-mail From: Stephen Leake Newsgroups: comp.lang.ada Subject: Re: OpenToken: Handling the empty word token References: <62121d9d-f208-4e78-a109-749742da14a6@h12g2000yqg.googlegroups.com> Date: Sat, 28 Jan 2012 05:46:45 -0500 Message-ID: <82k44cayui.fsf@stephe-leake.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (windows-nt) Cancel-Lock: sha1:kQDAOhvtkF9FRNdq8y6IWjzGkl4= MIME-Version: 1.0 X-Complaints-To: abuse@flashnewsgroups.com Organization: FlashNewsgroups.com X-Trace: 77a5a4f23d22be029e66109192 Content-Type: text/plain; charset=us-ascii Date: 2012-01-28T05:46:45-05:00 List-Id: mtrenkmann writes: > Very often grammars have so called epsilon-productions where one > alternative for a non-terminal symbol points to the empty word > (epsilon). I've never seen this, but I haven't seen many grammars. > For example: Optional -> Something | epsilon > > In OpenToken I modeled the epsilon token as an > OpenToken.Recognizer.Nothing.Instance and defined the production like > this: > > Optional <= Something and > Optional <= epsilon I've always coded this at a higher level. I'm guessing you want to have something like this: Settings <= required_settings & optional + action I would code that as: Settings <= required_settings + action and Settings <= required_settings & optional + action Which I assume you are doing as a workaround for what you want. Supporting epsilon is an interesting idea. It does reduce the amount of user-written code, especially when there is more than one optional part. > Now I realized that the lexer would actually never emit the epsilon > token, because of it's pure formal meaning, and thus the second > production would never be detected. Right. > Is there a way to instrument the parser to silently accept the epsilon > token whenever it expects it without consuming a token from the lexer, It's been a while since I messed with OpenToken at that level, but it looks like you could make this happen by changing opentoken-recognizer-nothing.adb Analyze to report Verdict := Matches. You might give that a try. I suspect this would have the problem that Dmitry pointed out; the epsilon token would always match, so you'd get lots of ambiguities. To get rid of the ambiguities, you would need to feed information about what the parser expects into the lexer, so Epsilon would only match if the parser expects it. That is possible, but there's no formal mechanism for it in OpenToken. You could define a new token type Epsilon, modeled on Nothing, and add a Set_Expected procedure. Then call that from somewhere in opentoken-production-parser-lalr.adb Parse; probably just before each call to Find_Next. The list of expected tokens is given by the function opentoken-production-parser-lalr.adb Expecting (currently used in error messages). If that works, it might be nice to define Set_Expected as a dispatching operation on OpenToken.Recognizer.Instance, so other tokens could use it. In that case, it might be worth it to cache the result of Expecting. Or pass Action_List directly to Set_Expecting, instead of reformatting it. > or is it a common convention to translate each grammar into a epsilon- > free representation? That's certainly how I do it. -- -- Stephe