From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,13b4e394fcd91d4 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,UTF8 Received: by 10.68.73.229 with SMTP id o5mr13739162pbv.7.1328014735231; Tue, 31 Jan 2012 04:58:55 -0800 (PST) Path: lh20ni244839pbb.0!nntp.google.com!news2.google.com!border1.nntp.dca.giganews.com!nntp.giganews.com!npeer01.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post02.iad.highwinds-media.com!news.flashnewsgroups.com-b7.4zTQh5tI3A!not-for-mail From: Stephen Leake Newsgroups: comp.lang.ada Subject: Re: OpenToken: Handling the empty word token References: <62121d9d-f208-4e78-a109-749742da14a6@h12g2000yqg.googlegroups.com> <82k44cayui.fsf@stephe-leake.org> <7756ad67-4ce1-4fb8-a4a1-f8136815ac9b@p13g2000yqd.googlegroups.com> Date: Tue, 31 Jan 2012 07:58:46 -0500 Message-ID: <82k448gha1.fsf@stephe-leake.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (windows-nt) Cancel-Lock: sha1:T527gUPBYIO9/XR7hfJLNlKtjJY= MIME-Version: 1.0 X-Complaints-To: abuse@flashnewsgroups.com Organization: FlashNewsgroups.com X-Trace: 100ee4f27e58fe029e66109630 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Date: 2012-01-31T07:58:46-05:00 List-Id: mtrenkmann writes: > On Jan 28, 11:46 am, Stephen Leake > wrote: >> mtrenkmann writes: >> > Very often grammars have so called epsilon-productions where one >> > alternative for a non-terminal symbol points to theemptyword >> > (epsilon). >> >> I've never seen this, but I haven't seen many grammars. >> > Here is the ASN.1 grammar I have to deal with (see Annex L): > http://www.itu.int/rec/T-REC-X.680-200811-I/en When OpenToken was first designed, the intent was to process small languages for system configuration. And that is how I'm using it at work. So I'm not surprised that it has a hard time with large languages. > Ok, thanks for all your helpful hints. In the meanwhile I came up with > the following workaround, that actually works fine. That looks like a reasonable workaround and proof of principle, but not a long-term solution. > I am wondering if that behavior cannot be implemented in the table- > lookup procedure. That is when I call Action_For with an unexpected > token, while there is the epsilon as one of the expected tokens, then > this epsilon could be accepted automatically, the parser changes its > state accordingly, and the unexpected token is checked again. Don't > know if that is possible, since I am not yet familiar with the > internals of table-driven parsing. It would certainly be more efficient than your current code. I suspect making epsilon an explicit token in the parser but not the lexer is the right way to go. I suggest you get "the red dragon book" (en.wikipedia.org/wiki/Principles_of_Compiler_Design [1]), and read chapter 5 (you probably have to read the earlier chapters also to understand it). Skimming thru chapter 5, I don't see any discussion on how to handle epsilon, but it must be in there somewhere. Example 5.36 has an example grammar with an epsilon. Implement that in an OpenToken test (or pick a small subset of your real language), and then see if you can figure out how to integrate it into the state machine. I find small tests immensely helpful in understanding the code. See test_lr0_kernels.adb for helpful techniques; setting Debug => True turns on lots of useful output. Another thing to be aware of; the lexer in OpenToken is horribly inefficient. So if you are planning to parse large amounts of text, you should consider replacing (or improving) the lexer. It would be interesting to see how hard that is in OpenToken, or you may want to switch to another generator (although I am not aware of any others written in and generating Ada). [1] The purple dragon book may be better; it's a lot more recent, but the blurb indicates it's less focussed on actually implementing parsers. And some of the reviews on Amazon imply there are better books with better techniques out there. So you may want to look for a better book; there are certainly plenty of compiler books on Amazon. -- -- Stephe