From: Stephen Leake <stephen_leake@stephe-leake.org>
Subject: Re: working towards OpenToken version 4
Date: Wed, 29 Jul 2009 21:39:39 -0400
Date: 2009-07-29T21:39:39-04:00 [thread overview]
Message-ID: <u8wi7ymn8.fsf@stephe-leake.org> (raw)
In-Reply-To: 2df50994-4bed-4396-aafb-4efc31dd02b6@c1g2000yqi.googlegroups.com
Ludovic Brenta <ludovic@ludovic-brenta.org> writes:
> Stephen Leake wrote on comp.lang.ada:
>> I've been thinking about ways to improve the recursive descent support
>> in OpenToken.
>
> Great. I like recursive descent parsers :)
I'm not a big fan, just given my recent experience with OpenToken. So
far, I've screwed up two attempts to implement a simple grammar using
recursive descent, but the LR parser generator got it right the first
time :).
>> Add_Operation_Instance is a type that implements {+ T}. We don't have
>> {} as an operator in Ada (never thought I'd be wishing we did !). It
>> takes two tokens to express that; Plus, and T. So I'm thinking we
>> could use the "**" operator. That leaves us with:
>>
>> L.all := E & EOF;
>> E.all := T ** Plus + Add_Integers'Access;
>> T.all := F ** Times + Multiply_Integers'Access;
>> F.all := Int_Literal or
>> New_Instance (Left_Paren & E & Right_Paren) + Copy_integer'Access;
>>
> Yes, it is worth it IMHO, but I'm not sure about "**". Consider:
>
> E -> T {+ U}
>
> (where the '+' might actually be some other, asymmetric, operator).
> In this case your use of "**" cannot work.
Not directly, but it can be done this way:
E := T or T & Plus & U ** Plus;
I'm not sure what you mean by "asymmetric" here. If you mean "must
have T on left, U on right", then the use of {} is inappropriate. {}
says this is legal:
T + U + U + U
Hmm. If we only want to allow only T + U + T + U, we'd have to do this:
E -> {+ R}
R -> T + U
where the two "+" operators are different.
If you mean "non-commutative", that's fine, but not very relevant.
OpenToken assumes everything is non-commutative.
> I think such a production should be represented with "or", e.g.
>
> E.all := T or (T & Plus & U + Add_Integers'Access);
This doesn't express the repetition that {} does. It implements the
production:
E -> T | T + U
If we want to fully express E -> T {+ U} without the braces, we need
to change this to allow recursion:
E -> T | E + U
That gives the repetition of + U, and the left recursion that kills
recursive descent.
Which is why Ted invented the List token, as an alternative to
Sequence and Selection. At least, that's the result; I'm not sure it
was the motivation; the comments are not clear.
>> I guess "*" would work equally well, and might be a closer match to
>> the intended semantics. It's not used elsewere in OpenToken as a
>> grammar operator. But it is used in Extended Backus-Naur Form to
>> indicate a precise number of repetitions, and we might add that to
>> OpenToken, as well.
>
> I like "**" better because it is asymmetric whereas "*" is symmetric.
Good point.
>> I've gotten backtracking in recursive descent actually working; it was
>> pretty broken before.
>>
>> None of this will be in 3.1; I want to get that release out quickly.
>> But I got inspired :).
>
> How about including the fix for backtracking in 3.1? I would think
> that such an important fix should get high priority.
That required significant changes, and I'm not done making things
clean.
With 3.1 you can't even declare a grammar that needs backtracking,
because it only allows lookahead 1 for recursive descent; you only
need backtracking with more lookaheads. So 3.1 isn't broken; it's just
lacking in expressive power. I only discovered backtracking was broken
by changing things to allow more lookaheads.
If you want to play with backtracking, it's in the
org.opentoken.stephe branch in ada-france. See
Test/test_backtrack.adb. That has "**" as well; see
Test/test_list_actions.adb and
Examples/ASU_Example_5_10/asu_example_5_10_rd_no_mixin-run.adb. Turn
on Trace_Parse; it's amazing how many operations it takes to parse 5 +
6 * 3 with a naive parser :).
There may be a 4.0 release shortly after the 3.1 release. The 3.1
release is just waiting on some 64 bit compilation flags from Reto
Buerki.
--
-- Stephe
prev parent reply other threads:[~2009-07-30 1:39 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-27 23:56 working towards OpenToken version 4 Stephen Leake
2009-07-28 9:29 ` Ludovic Brenta
2009-07-28 13:12 ` Dmitry A. Kazakov
2009-07-30 1:39 ` Stephen Leake [this message]
replies disabled
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox