comp.lang.ada
 help / color / mirror / Atom feed
From: Stephen Leake <stephen_leake@stephe-leake.org>
Subject: Re: working towards OpenToken version 4
Date: Wed, 29 Jul 2009 21:39:39 -0400
Date: 2009-07-29T21:39:39-04:00	[thread overview]
Message-ID: <u8wi7ymn8.fsf@stephe-leake.org> (raw)
In-Reply-To: 2df50994-4bed-4396-aafb-4efc31dd02b6@c1g2000yqi.googlegroups.com

Ludovic Brenta <ludovic@ludovic-brenta.org> writes:

> Stephen Leake wrote on comp.lang.ada:
>> I've been thinking about ways to improve the recursive descent support
>> in OpenToken.
>
> Great. I like recursive descent parsers :)

I'm not a big fan, just given my recent experience with OpenToken. So
far, I've screwed up two attempts to implement a simple grammar using
recursive descent, but the LR parser generator got it right the first
time :).

>> Add_Operation_Instance is a type that implements {+ T}. We don't have
>> {} as an operator in Ada (never thought I'd be wishing we did !). It
>> takes two tokens to express that; Plus, and T. So I'm thinking we
>> could use the "**" operator. That leaves us with:
>>
>> L.all := E & EOF;
>> E.all := T ** Plus + Add_Integers'Access;
>> T.all := F ** Times + Multiply_Integers'Access;
>> F.all := Int_Literal or
>>    New_Instance (Left_Paren & E & Right_Paren) + Copy_integer'Access;
>>
> Yes, it is worth it IMHO, but I'm not sure about "**". Consider:
>
> E -> T {+ U}
>
> (where the '+' might actually be some other, asymmetric, operator).
> In this case your use of "**" cannot work.  

Not directly, but it can be done this way:

E := T or T & Plus & U ** Plus;

I'm not sure what you mean by "asymmetric" here. If you mean "must
have T on left, U on right", then the use of {} is inappropriate. {}
says this is legal:

T + U + U + U

Hmm. If we only want to allow only T + U + T + U, we'd have to do this:

E -> {+ R}
R -> T + U

where the two "+" operators are different.

If you mean "non-commutative", that's fine, but not very relevant.
OpenToken assumes everything is non-commutative.

> I think such a production should be represented with "or", e.g.
>
> E.all := T or (T & Plus & U + Add_Integers'Access);

This doesn't express the repetition that {} does. It implements the
production:

E -> T | T + U

If we want to fully express E -> T {+ U} without the braces, we need
to change this to allow recursion:

E -> T | E + U

That gives the repetition of + U, and the left recursion that kills
recursive descent.

Which is why Ted invented the List token, as an alternative to
Sequence and Selection. At least, that's the result; I'm not sure it
was the motivation; the comments are not clear.

>> I guess "*" would work equally well, and might be a closer match to
>> the intended semantics. It's not used elsewere in OpenToken as a
>> grammar operator. But it is used in Extended Backus-Naur Form to
>> indicate a precise number of repetitions, and we might add that to
>> OpenToken, as well.
>
> I like "**" better because it is asymmetric whereas "*" is symmetric.

Good point.

>> I've gotten backtracking in recursive descent actually working; it was
>> pretty broken before.
>>
>> None of this will be in 3.1; I want to get that release out quickly.
>> But I got inspired :).
>
> How about including the fix for backtracking in 3.1? I would think
> that such an important fix should get high priority.

That required significant changes, and I'm not done making things
clean.

With 3.1 you can't even declare a grammar that needs backtracking,
because it only allows lookahead 1 for recursive descent; you only
need backtracking with more lookaheads. So 3.1 isn't broken; it's just
lacking in expressive power. I only discovered backtracking was broken
by changing things to allow more lookaheads.

If you want to play with backtracking, it's in the
org.opentoken.stephe branch in ada-france. See
Test/test_backtrack.adb. That has "**" as well; see
Test/test_list_actions.adb and
Examples/ASU_Example_5_10/asu_example_5_10_rd_no_mixin-run.adb. Turn
on Trace_Parse; it's amazing how many operations it takes to parse 5 +
6 * 3 with a naive parser :).

There may be a 4.0 release shortly after the 3.1 release. The 3.1
release is just waiting on some 64 bit compilation flags from Reto
Buerki.

-- 
-- Stephe



      parent reply	other threads:[~2009-07-30  1:39 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-27 23:56 working towards OpenToken version 4 Stephen Leake
2009-07-28  9:29 ` Ludovic Brenta
2009-07-28 13:12   ` Dmitry A. Kazakov
2009-07-30  1:39   ` Stephen Leake [this message]
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox