From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,13b4e394fcd91d4,start X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Received: by 10.68.190.99 with SMTP id gp3mr4326667pbc.1.1327681333145; Fri, 27 Jan 2012 08:22:13 -0800 (PST) Path: lh20ni230392pbb.0!nntp.google.com!news2.google.com!postnews.google.com!h12g2000yqg.googlegroups.com!not-for-mail From: mtrenkmann Newsgroups: comp.lang.ada Subject: OpenToken: Handling the empty word token Date: Fri, 27 Jan 2012 08:22:12 -0800 (PST) Organization: http://groups.google.com Message-ID: <62121d9d-f208-4e78-a109-749742da14a6@h12g2000yqg.googlegroups.com> NNTP-Posting-Host: 77.183.219.13 Mime-Version: 1.0 X-Trace: posting.google.com 1327681333 28010 127.0.0.1 (27 Jan 2012 16:22:13 GMT) X-Complaints-To: groups-abuse@google.com NNTP-Posting-Date: Fri, 27 Jan 2012 16:22:13 +0000 (UTC) Complaints-To: groups-abuse@google.com Injection-Info: h12g2000yqg.googlegroups.com; posting-host=77.183.219.13; posting-account=SkT_rQoAAADdG_K0wArhYj2acj1b3Kbm User-Agent: G2/1.0 X-Google-Web-Client: true X-Google-Header-Order: HNKRAUELSC X-HTTP-UserAgent: Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.45 Safari/534.13,gzip(gfe) Content-Type: text/plain; charset=ISO-8859-1 Date: 2012-01-27T08:22:12-08:00 List-Id: Hello all. Very often grammars have so called epsilon-productions where one alternative for a non-terminal symbol points to the empty word (epsilon). For example: Optional -> Something | epsilon In OpenToken I modeled the epsilon token as an OpenToken.Recognizer.Nothing.Instance and defined the production like this: Optional <= Something and Optional <= epsilon Now I realized that the lexer would actually never emit the epsilon token, because of it's pure formal meaning, and thus the second production would never be detected. Is there a way to instrument the parser to silently accept the epsilon token whenever it expects it without consuming a token from the lexer, or is it a common convention to translate each grammar into a epsilon- free representation? Thanks in advance. -- Martin