comp.lang.ada
 help / color / mirror / Atom feed
From: Ludovic Brenta <ludovic@ludovic-brenta.org>
Subject: Re: working towards OpenToken version 4
Date: Tue, 28 Jul 2009 02:29:35 -0700 (PDT)
Date: 2009-07-28T02:29:35-07:00	[thread overview]
Message-ID: <2df50994-4bed-4396-aafb-4efc31dd02b6@c1g2000yqi.googlegroups.com> (raw)
In-Reply-To: uhbwxfzog.fsf@stephe-leake.org

Stephen Leake wrote on comp.lang.ada:
> I've been thinking about ways to improve the recursive descent support
> in OpenToken.

Great. I like recursive descent parsers :)

> One thing that could use work is the syntax for grammar specification.
>
> Consider example 5.10 from the dragon book; see
> OpenToken/Examples/ASU_Example_5_10/asu_example_5_10.ads
>
> The grammar, with production actions, is given as:
>
>    --  L -> E         print (L.val)
>    --  E -> E + T     E.val := E1.val + T.val
>    --  E -> T
>    --  T -> T * F     T.val := T1.val * F.val
>    --  T -> F
>    --  F -> ( E )     F.val := E.val
>    --  F -> digit
>
> For the OpenToken LR parser generator, the code that expresses this is:
>
>    Grammar : constant Production_List.Instance :=
>      L <= E & EOF                      + Simple_Integer.Print_Value'Access and
>      E <= E & Plus & T                 + Simple_Integer.Add_Integers       and
>      E <= T                                                                and
>      T <= T & Times & F                + Simple_Integer.Multiply_Integers  and
>      T <= F                                                                and
>      F <= Left_Paren & E & Right_Paren + Simple_Integer.Synthesize_Second  and
>      F <= Int_Literal;
>
> A very nice match.
>
> However, for the recursive descent case, things are not so nice.
> First, the grammar must be rewritten. That's expected; it's left
> recursive, which leads to infinite recursion. Ted Dennison rewrote it
> this way, using {} to indicate possible repetition (I had to figure
> this out, it's not in the code):
>
>    --  L -> E           print (L.val)
>    --  E -> T {+ T}     E.val := T1.val + T2.val ...
>    --  T -> F {* F}     T.val := F1.val * F2.val ...
>    --  F -> ( E )       F.val := E.val
>    --  F -> digit
>
> The code for this is (see
> OpenToken/Examples/ASU_Example_5_10/asu_example_5_10_rd-run.adb):
>
>    L.all := E & EOF;
>    E.all := Operation_List.Class (Add_Operation_Instance'(Get (T, Plus)));
>    T.all := Operation_List.Class (Multiply_Operation_Instance'(Get (F, Times)));
>    F.all := Int_Literal or Expression_Sequence_Handle'
>        (new Expression_Sequence_Instance'(Left_Paren & E & Right_Paren));
>
> not so nice.
>
> First, the actions are nowhere to be seen; they are given by
> overriding declarations elsewhere. We can fix that by making them
> runtime procedure pointers, and using the "+" operator, as for the LR
> parser generator. That will significantly reduce the number of generic
> instantiations required by this example, while introducing negligible
> runtime overhead.
>
> Add_Operation_Instance is a type that implements {+ T}. We don't have
> {} as an operator in Ada (never thought I'd be wishing we did !). It
> takes two tokens to express that; Plus, and T. So I'm thinking we
> could use the "**" operator. That leaves us with:
>
>    L.all := E & EOF;
>    E.all := T ** Plus + Add_Integers'Access;
>    T.all := F ** Times + Multiply_Integers'Access;
>    F.all := Int_Literal or
>        New_Instance (Left_Paren & E & Right_Paren) + Copy_integer'Access;
>
> Can't get rid of that New_Instance call; I've tried :).
>
> Opinions? Is this worth it? (besides being fun :)

Yes, it is worth it IMHO, but I'm not sure about "**". Consider:

E -> T {+ U}

(where the '+' might actually be some other, asymmetric, operator).
In this case your use of "**" cannot work.  I think such a production
should be represented with "or", e.g.

E.all := T or (T & Plus & U + Add_Integers'Access);

> I guess "*" would work equally well, and might be a closer match to
> the intended semantics. It's not used elsewere in OpenToken as a
> grammar operator. But it is used in Extended Backus-Naur Form to
> indicate a precise number of repetitions, and we might add that to
> OpenToken, as well.

I like "**" better because it is asymmetric whereas "*" is symmetric.

> I've gotten backtracking in recursive descent actually working; it was
> pretty broken before.
>
> None of this will be in 3.1; I want to get that release out quickly.
> But I got inspired :).

How about including the fix for backtracking in 3.1? I would think
that such an important fix should get high priority.

> Ludovic: the ada-france mtn server is down. Can you fix?

Fixed.

--
Ludovic Brenta.



  reply	other threads:[~2009-07-28  9:29 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-27 23:56 working towards OpenToken version 4 Stephen Leake
2009-07-28  9:29 ` Ludovic Brenta [this message]
2009-07-28 13:12   ` Dmitry A. Kazakov
2009-07-30  1:39   ` Stephen Leake
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox