From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,13b4e394fcd91d4 X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,ASCII-7-bit Received: by 10.68.73.229 with SMTP id o5mr12419559pbv.7.1327971403534; Mon, 30 Jan 2012 16:56:43 -0800 (PST) MIME-Version: 1.0 From: "Randy Brukardt" Newsgroups: comp.lang.ada Subject: Re: OpenToken: Handling the empty word token Date: Mon, 30 Jan 2012 18:56:36 -0600 Organization: Jacob Sparre Andersen Research & Innovation Message-ID: References: <62121d9d-f208-4e78-a109-749742da14a6@h12g2000yqg.googlegroups.com> <1jvlv7i0tn14u.b5d2cwsqhl2h$.dlg@40tude.net> <82ehuibdwt.fsf@stephe-leake.org> NNTP-Posting-Host: static-69-95-181-76.mad.choiceone.net X-Trace: munin.nbi.dk 1327971401 2297 69.95.181.76 (31 Jan 2012 00:56:41 GMT) X-Complaints-To: news@jacob-sparre.dk NNTP-Posting-Date: Tue, 31 Jan 2012 00:56:41 +0000 (UTC) X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2900.5931 X-RFC2646: Format=Flowed; Original X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Path: lh20ni242994pbb.0!nntp.google.com!news1.google.com!news4.google.com!proxad.net!feeder1-2.proxad.net!feed.ac-versailles.fr!news.ecp.fr!news.jacob-sparre.dk!pnx.dk!jacob-sparre.dk!ada-dk.org!.POSTED!not-for-mail Date: 2012-01-30T18:56:36-06:00 List-Id: "Stephen Leake" wrote in message news:82ehuibdwt.fsf@stephe-leake.org... > "Randy Brukardt" writes: ... >> I'd be surprised if OpenToken didn't have something similar; > > Not quite. Because OpenToken uses Ada types to build the grammar, we > need an explicit Epsilon token (full code below): I see. It appears that OpenToken uses an LL-derived parsing scheme. (It appears to be a version of recursive descent, which is an LL scheme.] And it's important to note that LL parsing cannot include epsilon tokens and have problems with left-recursion. LR-derived parsing does not have either of these limitations. And note that there is no real workaround to these limitations (short of rewriting your grammar); they're inherent in the parsing technique. [Note: I learned all about parsing back in the early 1980's, so (1) I may be remembering something wrong, or (2) the state of the art may have changed some.] I would suggest that the OP use a LR-based grammar generator. That won't be quite so "pretty" as using "Ada types", but it will work well with almost any original grammar that you can throw at it. (I believe that modern parser generators use LR(1) tables rather than the LALR(1) tables used in the past; LR(1) tables are a lot bigger to generate and we simply didn't have enough memory to use that back in the day.) Hacking an LL-based scheme (such as OpenToken's) to work is pretty much doomed to failure. You might get it to work for limited circumstances, but I don't think it would work in any general sort of way. Randy.