comp.lang.ada
 help / color / mirror / Atom feed
* working towards OpenToken version 4
@ 2009-07-27 23:56 Stephen Leake
  2009-07-28  9:29 ` Ludovic Brenta
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen Leake @ 2009-07-27 23:56 UTC (permalink / raw)


I've been thinking about ways to improve the recursive descent support
in OpenToken.

One thing that could use work is the syntax for grammar specification.

Consider example 5.10 from the dragon book; see
OpenToken/Examples/ASU_Example_5_10/asu_example_5_10.ads

The grammar, with production actions, is given as:

   --  L -> E         print (L.val)
   --  E -> E + T     E.val := E1.val + T.val
   --  E -> T
   --  T -> T * F     T.val := T1.val * F.val
   --  T -> F
   --  F -> ( E )     F.val := E.val
   --  F -> digit

For the OpenToken LR parser generator, the code that expresses this is:

   Grammar : constant Production_List.Instance :=
     L <= E & EOF                      + Simple_Integer.Print_Value'Access and
     E <= E & Plus & T                 + Simple_Integer.Add_Integers       and
     E <= T                                                                and
     T <= T & Times & F                + Simple_Integer.Multiply_Integers  and
     T <= F                                                                and
     F <= Left_Paren & E & Right_Paren + Simple_Integer.Synthesize_Second  and
     F <= Int_Literal;

A very nice match.

However, for the recursive descent case, things are not so nice.
First, the grammar must be rewritten. That's expected; it's left
recursive, which leads to infinite recursion. Ted Dennison rewrote it
this way, using {} to indicate possible repetition (I had to figure
this out, it's not in the code):

   --  L -> E           print (L.val)
   --  E -> T {+ T}     E.val := T1.val + T2.val ...
   --  T -> F {* F}     T.val := F1.val * F2.val ...
   --  F -> ( E )       F.val := E.val
   --  F -> digit

The code for this is (see
OpenToken/Examples/ASU_Example_5_10/asu_example_5_10_rd-run.adb):

   L.all := E & EOF;
   E.all := Operation_List.Class (Add_Operation_Instance'(Get (T, Plus)));
   T.all := Operation_List.Class (Multiply_Operation_Instance'(Get (F, Times)));
   F.all := Int_Literal or Expression_Sequence_Handle'
       (new Expression_Sequence_Instance'(Left_Paren & E & Right_Paren));

not so nice. 

First, the actions are nowhere to be seen; they are given by
overriding declarations elsewhere. We can fix that by making them
runtime procedure pointers, and using the "+" operator, as for the LR
parser generator. That will significantly reduce the number of generic
instantiations required by this example, while introducing negligible
runtime overhead.

Add_Operation_Instance is a type that implements {+ T}. We don't have
{} as an operator in Ada (never thought I'd be wishing we did !). It
takes two tokens to express that; Plus, and T. So I'm thinking we
could use the "**" operator. That leaves us with:

   L.all := E & EOF;
   E.all := T ** Plus + Add_Integers'Access;
   T.all := F ** Times + Multiply_Integers'Access;
   F.all := Int_Literal or 
       New_Instance (Left_Paren & E & Right_Paren) + Copy_integer'Access;

Can't get rid of that New_Instance call; I've tried :).

Opinions? Is this worth it? (besides being fun :)

I guess "*" would work equally well, and might be a closer match to
the intended semantics. It's not used elsewere in OpenToken as a
grammar operator. But it is used in Extended Backus-Naur Form to
indicate a precise number of repetitions, and we might add that to
OpenToken, as well.

I've gotten backtracking in recursive descent actually working; it was
pretty broken before.

None of this will be in 3.1; I want to get that release out quickly.
But I got inspired :).

Ludovic: the ada-france mtn server is down. Can you fix?

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: working towards OpenToken version 4
  2009-07-27 23:56 working towards OpenToken version 4 Stephen Leake
@ 2009-07-28  9:29 ` Ludovic Brenta
  2009-07-28 13:12   ` Dmitry A. Kazakov
  2009-07-30  1:39   ` Stephen Leake
  0 siblings, 2 replies; 4+ messages in thread
From: Ludovic Brenta @ 2009-07-28  9:29 UTC (permalink / raw)


Stephen Leake wrote on comp.lang.ada:
> I've been thinking about ways to improve the recursive descent support
> in OpenToken.

Great. I like recursive descent parsers :)

> One thing that could use work is the syntax for grammar specification.
>
> Consider example 5.10 from the dragon book; see
> OpenToken/Examples/ASU_Example_5_10/asu_example_5_10.ads
>
> The grammar, with production actions, is given as:
>
>    --  L -> E         print (L.val)
>    --  E -> E + T     E.val := E1.val + T.val
>    --  E -> T
>    --  T -> T * F     T.val := T1.val * F.val
>    --  T -> F
>    --  F -> ( E )     F.val := E.val
>    --  F -> digit
>
> For the OpenToken LR parser generator, the code that expresses this is:
>
>    Grammar : constant Production_List.Instance :=
>      L <= E & EOF                      + Simple_Integer.Print_Value'Access and
>      E <= E & Plus & T                 + Simple_Integer.Add_Integers       and
>      E <= T                                                                and
>      T <= T & Times & F                + Simple_Integer.Multiply_Integers  and
>      T <= F                                                                and
>      F <= Left_Paren & E & Right_Paren + Simple_Integer.Synthesize_Second  and
>      F <= Int_Literal;
>
> A very nice match.
>
> However, for the recursive descent case, things are not so nice.
> First, the grammar must be rewritten. That's expected; it's left
> recursive, which leads to infinite recursion. Ted Dennison rewrote it
> this way, using {} to indicate possible repetition (I had to figure
> this out, it's not in the code):
>
>    --  L -> E           print (L.val)
>    --  E -> T {+ T}     E.val := T1.val + T2.val ...
>    --  T -> F {* F}     T.val := F1.val * F2.val ...
>    --  F -> ( E )       F.val := E.val
>    --  F -> digit
>
> The code for this is (see
> OpenToken/Examples/ASU_Example_5_10/asu_example_5_10_rd-run.adb):
>
>    L.all := E & EOF;
>    E.all := Operation_List.Class (Add_Operation_Instance'(Get (T, Plus)));
>    T.all := Operation_List.Class (Multiply_Operation_Instance'(Get (F, Times)));
>    F.all := Int_Literal or Expression_Sequence_Handle'
>        (new Expression_Sequence_Instance'(Left_Paren & E & Right_Paren));
>
> not so nice.
>
> First, the actions are nowhere to be seen; they are given by
> overriding declarations elsewhere. We can fix that by making them
> runtime procedure pointers, and using the "+" operator, as for the LR
> parser generator. That will significantly reduce the number of generic
> instantiations required by this example, while introducing negligible
> runtime overhead.
>
> Add_Operation_Instance is a type that implements {+ T}. We don't have
> {} as an operator in Ada (never thought I'd be wishing we did !). It
> takes two tokens to express that; Plus, and T. So I'm thinking we
> could use the "**" operator. That leaves us with:
>
>    L.all := E & EOF;
>    E.all := T ** Plus + Add_Integers'Access;
>    T.all := F ** Times + Multiply_Integers'Access;
>    F.all := Int_Literal or
>        New_Instance (Left_Paren & E & Right_Paren) + Copy_integer'Access;
>
> Can't get rid of that New_Instance call; I've tried :).
>
> Opinions? Is this worth it? (besides being fun :)

Yes, it is worth it IMHO, but I'm not sure about "**". Consider:

E -> T {+ U}

(where the '+' might actually be some other, asymmetric, operator).
In this case your use of "**" cannot work.  I think such a production
should be represented with "or", e.g.

E.all := T or (T & Plus & U + Add_Integers'Access);

> I guess "*" would work equally well, and might be a closer match to
> the intended semantics. It's not used elsewere in OpenToken as a
> grammar operator. But it is used in Extended Backus-Naur Form to
> indicate a precise number of repetitions, and we might add that to
> OpenToken, as well.

I like "**" better because it is asymmetric whereas "*" is symmetric.

> I've gotten backtracking in recursive descent actually working; it was
> pretty broken before.
>
> None of this will be in 3.1; I want to get that release out quickly.
> But I got inspired :).

How about including the fix for backtracking in 3.1? I would think
that such an important fix should get high priority.

> Ludovic: the ada-france mtn server is down. Can you fix?

Fixed.

--
Ludovic Brenta.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: working towards OpenToken version 4
  2009-07-28  9:29 ` Ludovic Brenta
@ 2009-07-28 13:12   ` Dmitry A. Kazakov
  2009-07-30  1:39   ` Stephen Leake
  1 sibling, 0 replies; 4+ messages in thread
From: Dmitry A. Kazakov @ 2009-07-28 13:12 UTC (permalink / raw)


On Tue, 28 Jul 2009 02:29:35 -0700 (PDT), Ludovic Brenta wrote:

> Stephen Leake wrote on comp.lang.ada:

>> I've been thinking about ways to improve the recursive descent support
>> in OpenToken.
> 
> Great. I like recursive descent parsers :)

The best of them is that they do not need grammars... (:-))

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: working towards OpenToken version 4
  2009-07-28  9:29 ` Ludovic Brenta
  2009-07-28 13:12   ` Dmitry A. Kazakov
@ 2009-07-30  1:39   ` Stephen Leake
  1 sibling, 0 replies; 4+ messages in thread
From: Stephen Leake @ 2009-07-30  1:39 UTC (permalink / raw)


Ludovic Brenta <ludovic@ludovic-brenta.org> writes:

> Stephen Leake wrote on comp.lang.ada:
>> I've been thinking about ways to improve the recursive descent support
>> in OpenToken.
>
> Great. I like recursive descent parsers :)

I'm not a big fan, just given my recent experience with OpenToken. So
far, I've screwed up two attempts to implement a simple grammar using
recursive descent, but the LR parser generator got it right the first
time :).

>> Add_Operation_Instance is a type that implements {+ T}. We don't have
>> {} as an operator in Ada (never thought I'd be wishing we did !). It
>> takes two tokens to express that; Plus, and T. So I'm thinking we
>> could use the "**" operator. That leaves us with:
>>
>> L.all := E & EOF;
>> E.all := T ** Plus + Add_Integers'Access;
>> T.all := F ** Times + Multiply_Integers'Access;
>> F.all := Int_Literal or
>>    New_Instance (Left_Paren & E & Right_Paren) + Copy_integer'Access;
>>
> Yes, it is worth it IMHO, but I'm not sure about "**". Consider:
>
> E -> T {+ U}
>
> (where the '+' might actually be some other, asymmetric, operator).
> In this case your use of "**" cannot work.  

Not directly, but it can be done this way:

E := T or T & Plus & U ** Plus;

I'm not sure what you mean by "asymmetric" here. If you mean "must
have T on left, U on right", then the use of {} is inappropriate. {}
says this is legal:

T + U + U + U

Hmm. If we only want to allow only T + U + T + U, we'd have to do this:

E -> {+ R}
R -> T + U

where the two "+" operators are different.

If you mean "non-commutative", that's fine, but not very relevant.
OpenToken assumes everything is non-commutative.

> I think such a production should be represented with "or", e.g.
>
> E.all := T or (T & Plus & U + Add_Integers'Access);

This doesn't express the repetition that {} does. It implements the
production:

E -> T | T + U

If we want to fully express E -> T {+ U} without the braces, we need
to change this to allow recursion:

E -> T | E + U

That gives the repetition of + U, and the left recursion that kills
recursive descent.

Which is why Ted invented the List token, as an alternative to
Sequence and Selection. At least, that's the result; I'm not sure it
was the motivation; the comments are not clear.

>> I guess "*" would work equally well, and might be a closer match to
>> the intended semantics. It's not used elsewere in OpenToken as a
>> grammar operator. But it is used in Extended Backus-Naur Form to
>> indicate a precise number of repetitions, and we might add that to
>> OpenToken, as well.
>
> I like "**" better because it is asymmetric whereas "*" is symmetric.

Good point.

>> I've gotten backtracking in recursive descent actually working; it was
>> pretty broken before.
>>
>> None of this will be in 3.1; I want to get that release out quickly.
>> But I got inspired :).
>
> How about including the fix for backtracking in 3.1? I would think
> that such an important fix should get high priority.

That required significant changes, and I'm not done making things
clean.

With 3.1 you can't even declare a grammar that needs backtracking,
because it only allows lookahead 1 for recursive descent; you only
need backtracking with more lookaheads. So 3.1 isn't broken; it's just
lacking in expressive power. I only discovered backtracking was broken
by changing things to allow more lookaheads.

If you want to play with backtracking, it's in the
org.opentoken.stephe branch in ada-france. See
Test/test_backtrack.adb. That has "**" as well; see
Test/test_list_actions.adb and
Examples/ASU_Example_5_10/asu_example_5_10_rd_no_mixin-run.adb. Turn
on Trace_Parse; it's amazing how many operations it takes to parse 5 +
6 * 3 with a naive parser :).

There may be a 4.0 release shortly after the 3.1 release. The 3.1
release is just waiting on some 64 bit compilation flags from Reto
Buerki.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-07-30  1:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-27 23:56 working towards OpenToken version 4 Stephen Leake
2009-07-28  9:29 ` Ludovic Brenta
2009-07-28 13:12   ` Dmitry A. Kazakov
2009-07-30  1:39   ` Stephen Leake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox