From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: a07f3367d7,f46e95f7ed68fc6b X-Google-Attributes: gida07f3367d7,public,usenet X-Google-NewGroupId: yes X-Google-Language: ENGLISH,ASCII-7-bit Path: g2news2.google.com!news1.google.com!npeer02.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post02.iad.highwinds-media.com!news.flashnewsgroups.com-b7.4zTQh5tI3A!not-for-mail Newsgroups: comp.lang.ada Subject: Re: working towards OpenToken version 4 References: <2df50994-4bed-4396-aafb-4efc31dd02b6@c1g2000yqi.googlegroups.com> From: Stephen Leake Date: Wed, 29 Jul 2009 21:39:39 -0400 Message-ID: User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (windows-nt) Cancel-Lock: sha1:C4rOWbcqERmoyI0SxcIjEarxYks= MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Complaints-To: abuse@flashnewsgroups.com Organization: FlashNewsgroups.com X-Trace: 393bc4a70f9e6e197caa724274 Xref: g2news2.google.com comp.lang.ada:7429 Date: 2009-07-29T21:39:39-04:00 List-Id: Ludovic Brenta writes: > Stephen Leake wrote on comp.lang.ada: >> I've been thinking about ways to improve the recursive descent support >> in OpenToken. > > Great. I like recursive descent parsers :) I'm not a big fan, just given my recent experience with OpenToken. So far, I've screwed up two attempts to implement a simple grammar using recursive descent, but the LR parser generator got it right the first time :). >> Add_Operation_Instance is a type that implements {+ T}. We don't have >> {} as an operator in Ada (never thought I'd be wishing we did !). It >> takes two tokens to express that; Plus, and T. So I'm thinking we >> could use the "**" operator. That leaves us with: >> >> L.all := E & EOF; >> E.all := T ** Plus + Add_Integers'Access; >> T.all := F ** Times + Multiply_Integers'Access; >> F.all := Int_Literal or >> New_Instance (Left_Paren & E & Right_Paren) + Copy_integer'Access; >> > Yes, it is worth it IMHO, but I'm not sure about "**". Consider: > > E -> T {+ U} > > (where the '+' might actually be some other, asymmetric, operator). > In this case your use of "**" cannot work. Not directly, but it can be done this way: E := T or T & Plus & U ** Plus; I'm not sure what you mean by "asymmetric" here. If you mean "must have T on left, U on right", then the use of {} is inappropriate. {} says this is legal: T + U + U + U Hmm. If we only want to allow only T + U + T + U, we'd have to do this: E -> {+ R} R -> T + U where the two "+" operators are different. If you mean "non-commutative", that's fine, but not very relevant. OpenToken assumes everything is non-commutative. > I think such a production should be represented with "or", e.g. > > E.all := T or (T & Plus & U + Add_Integers'Access); This doesn't express the repetition that {} does. It implements the production: E -> T | T + U If we want to fully express E -> T {+ U} without the braces, we need to change this to allow recursion: E -> T | E + U That gives the repetition of + U, and the left recursion that kills recursive descent. Which is why Ted invented the List token, as an alternative to Sequence and Selection. At least, that's the result; I'm not sure it was the motivation; the comments are not clear. >> I guess "*" would work equally well, and might be a closer match to >> the intended semantics. It's not used elsewere in OpenToken as a >> grammar operator. But it is used in Extended Backus-Naur Form to >> indicate a precise number of repetitions, and we might add that to >> OpenToken, as well. > > I like "**" better because it is asymmetric whereas "*" is symmetric. Good point. >> I've gotten backtracking in recursive descent actually working; it was >> pretty broken before. >> >> None of this will be in 3.1; I want to get that release out quickly. >> But I got inspired :). > > How about including the fix for backtracking in 3.1? I would think > that such an important fix should get high priority. That required significant changes, and I'm not done making things clean. With 3.1 you can't even declare a grammar that needs backtracking, because it only allows lookahead 1 for recursive descent; you only need backtracking with more lookaheads. So 3.1 isn't broken; it's just lacking in expressive power. I only discovered backtracking was broken by changing things to allow more lookaheads. If you want to play with backtracking, it's in the org.opentoken.stephe branch in ada-france. See Test/test_backtrack.adb. That has "**" as well; see Test/test_list_actions.adb and Examples/ASU_Example_5_10/asu_example_5_10_rd_no_mixin-run.adb. Turn on Trace_Parse; it's amazing how many operations it takes to parse 5 + 6 * 3 with a naive parser :). There may be a 4.0 release shortly after the 3.1 release. The 3.1 release is just waiting on some 64 bit compilation flags from Reto Buerki. -- -- Stephe