LALR parser question

comp.lang.ada
 help / color / mirror / Atom feed

* LALR parser question
@ 2013-04-28 13:37 Stephen Leake
  2013-04-28 14:43 ` Dmitry A. Kazakov
  2013-05-02  1:49 ` Randy Brukardt
  0 siblings, 2 replies; 24+ messages in thread
From: Stephen Leake @ 2013-04-28 13:37 UTC (permalink / raw)


As part of Emacs Ada mode 5.0, I'm building a generalized LALR grammar
for Ada (see
http://stephe-leake.org/emacs/ada-mode/emacs-ada-mode.html#ada-mode-5.0 ) 

I'm having trouble with empty declarations. For example (using Bison
syntax for the grammar), a simplified subset of the Ada package_body sytax is:

package_body
  : IS declaration_list BEGIN SEMICOLON
  ;

declaration_list
  : declaration
  | declaration_list declaration
  ;

declaration
  : object_declaration
  | subprogram_declaration
  ;; ...
  ;

This does not allow an empty declaration_list. But Ada does, so the
question is how can we add that to the grammar.

There are three choices:

1) Add an empty declaration choice to declaration_list:

declaration_list
  : ;; empty list
  | declaration
  | declaration_list declaration
  ;

This is now redundant; since declaration_list can be empty, the second
choice is not needed:

declaration_list
  : ;; empty list
  | declaration_list declaration
  ;

2) Add an empty declaration choice to declaration:

declaration
  : ;; empty declaration
  | object_declaration
  | subprogram_declaration
  ;; ...
  ;

3) Add another choice in package_body that leaves out declaration_list:

package_body
  : PACKAGE name IS declaration_list BEGIN statement_list END SEMICOLON
  | PACKAGE name IS BEGIN statement_list END SEMICOLON
  ;

OpenToken cannot handle choice 1; every occurance of declaration_list
appears to be empty, giving parse errors at parse time. For example, on
this input:

is begin;

gives a syntax error:
shift_conflict_bug.input:2:4: Syntax error; expecting 'EOF_ID' or
'IS_ID'; found BEGIN_ID 'begin'

I'm not clear if this is expected because of the way LALR works, or if
this is a bug somewhere in OpenToken (either the grammar generator or
the parser); any clues?

Choice 2 leads to a shift/reduce conflict in the production for
package_body; I believe extending the OpenToken parser to a generalized
parser would allow it to handle this option. However, the OpenToken
grammar generator currently reports a bug, instead of reporting the
conflict.

Choice 3 works with the current OpenToken, but of course it is very
tedious; every occurance of declaration_list must be handled in the
same way.

Any other ways to handle this problem?

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-04-28 13:37 LALR parser question Stephen Leake
@ 2013-04-28 14:43 ` Dmitry A. Kazakov
  2013-04-30  1:19   ` Yannick Duchêne (Hibou57)
  2013-05-02  1:49 ` Randy Brukardt
  1 sibling, 1 reply; 24+ messages in thread
From: Dmitry A. Kazakov @ 2013-04-28 14:43 UTC (permalink / raw)


On Sun, 28 Apr 2013 08:37:33 -0500, Stephen Leake wrote:

> Any other ways to handle this problem?

Recursive decent parser.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-04-28 14:43 ` Dmitry A. Kazakov
@ 2013-04-30  1:19   ` Yannick Duchêne (Hibou57)
  2013-04-30  2:03     ` John B. Matthews
  2013-04-30 16:06     ` Shark8
  0 siblings, 2 replies; 24+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2013-04-30  1:19 UTC (permalink / raw)


Le Sun, 28 Apr 2013 16:43:06 +0200, Dmitry A. Kazakov  
<mailbox@dmitry-kazakov.de> a écrit:

> On Sun, 28 Apr 2013 08:37:33 -0500, Stephen Leake wrote:
>
>> Any other ways to handle this problem?
>
> Recursive decent parser.

Or return to the original ascent parser:

http://en.wikipedia.org/wiki/LALR_parser
> In 1965, Donald Knuth invented the LR parser (Left to Right,
> Rightmost derivation). The LR parser can recognize any
> deterministic context-free language in linear-bounded time.
> However, rightmost derivation has very large memory requirements
> and implementing an LR parser was impractical due to the limited
> memory of computers at that time. To address this shortcoming, in
> 1969, Frank DeRemer proposed two simplified versions of the LR
> parser, namely the Look-Ahead LR (LALR)

And above in the introduction, commenting about LALR:
> The simplification that takes place results in a parser with
> significantly reduced memory requirements but decreased language
> recognition power.

This suggest LALR is limited by design and the original LR was less  
limited.

I did a top‑down (decent) parser some many years ago, and was not that  
much happy with neither. Either ascent, decent, left‑right or right‑left,  
they all requires to torture the grammar to make the parser generator  
happy.

Note: GNAT uses a hand‑written parser (I'm pretty sure to remember I've  
seen it somewhere).


-- 
“Syntactic sugar causes cancer of the semi-colons.” [1]
“Structured Programming supports the law of the excluded muddle.” [1]
[1]: Epigrams on Programming — Alan J. — P. Yale University



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-04-30  1:19   ` Yannick Duchêne (Hibou57)
@ 2013-04-30  2:03     ` John B. Matthews
  2013-04-30  4:11       ` Yannick Duchêne (Hibou57)
  2013-04-30 16:06     ` Shark8
  1 sibling, 1 reply; 24+ messages in thread
From: John B. Matthews @ 2013-04-30  2:03 UTC (permalink / raw)


In article <op.wwbxyoz9ule2fv@cardamome>,
 Yannick Duchêne (Hibou57) <yannick_duchene@yahoo.fr> wrote:

> Note: GNAT uses a hand-written parser (I'm pretty sure to remember 
> I've  seen it somewhere).

See also GNAT: The GNU Ada Compiler, §2.2 The Parser, cited here:

<http://stackoverflow.com/a/15544761/230513>

-- 
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-04-30  2:03     ` John B. Matthews
@ 2013-04-30  4:11       ` Yannick Duchêne (Hibou57)
  2013-04-30 11:55         ` Peter C. Chapin
  0 siblings, 1 reply; 24+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2013-04-30  4:11 UTC (permalink / raw)


Le Tue, 30 Apr 2013 04:03:13 +0200, John B. Matthews  
<nospam@nospam.invalid> a écrit:

> In article <op.wwbxyoz9ule2fv@cardamome>,
>  Yannick Duchêne (Hibou57) <yannick_duchene@yahoo.fr> wrote:
>
>> Note: GNAT uses a hand-written parser (I'm pretty sure to remember
>> I've  seen it somewhere).
>
> See also GNAT: The GNU Ada Compiler, §2.2 The Parser, cited here:
>
> <http://stackoverflow.com/a/15544761/230513>
>

While hand‑written parser is not the only worth choice, a good reason for  
this choice is as they say:

“GNU Ada Compiler, §2.2 The Parser”:
> At the architectural level the main subprogram of the
> GNAT parser is an Ada function (Par) that is called to
> analyze each compilation unit. The parser code is organized
> as a set of packages (subunits of the top-level function)
> each of which contains the parsing routines associated with
> one chapter of the Ada Reference Manual [AAR95].

You can't do this with a parser generated automatically from a grammar,  
you can't split the generated parser as it's an automata generated as a  
indivisible whole.

There are still some points I don't agree with…

> Better error messages. GNAT generates clear andprecise error messages
There is a common idiom with automatically generated parsers, to complete  
the grammar so that it include wrong production, which normally does not  
introduce ambiguities, as these are wrong and normally not recognized by  
the grammar. But I agree this is easier with a hand‑written parser.

> Clarity. The GNAT parser follows faithfully the
> grammar given in the Ada Reference Manual [AAR95]
You get the same from an AST produced by an automatically generated  
parser; clarity just happens to be visible more later.

> Performance. The GNAT parser is as fast as any Ada
> table driven parser, and arguably faster than a
> LALR parser.

The main issue with this argument, is that performance isn't a big issue  
at this stage. What comes next — analyses, transformations, code  
generation — is more important for the overall performance of a compiler.  
Then anyway, being exact is far more important than being fast, for a  
compiler, and in that matter, the two previous points values a lot more  
(the points about clarity and precision).



-- 
“Syntactic sugar causes cancer of the semi-colons.” [1]
“Structured Programming supports the law of the excluded muddle.” [1]
[1]: Epigrams on Programming — Alan J. — P. Yale University



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-04-30  4:11       ` Yannick Duchêne (Hibou57)
@ 2013-04-30 11:55         ` Peter C. Chapin
  2013-04-30 13:14           ` john
  0 siblings, 1 reply; 24+ messages in thread
From: Peter C. Chapin @ 2013-04-30 11:55 UTC (permalink / raw)

On 04/30/2013 12:11 AM, Yannick Duchêne (Hibou57) wrote:

> The main issue with this argument, is that performance isn't a big issue
> at this stage. What comes next — analyses, transformations, code
> generation — is more important for the overall performance of a
> compiler. Then anyway, being exact is far more important than being
> fast, for a compiler, and in that matter, the two previous points values
> a lot more (the points about clarity and precision).

I've started toying around with building an Ada 2012 parser using the 
ANTLR v4 parser generator tool. I haven't gotten very far with this yet 
so it's too early to draw any conclusions but ANTLR is quite powerful so 
I'm hoping the process will be fairly enjoyable... educational at least :)

ANTRL v3 generates LL(*) parsers. I understand ANTLR v4 uses a somewhat 
different algorithm and is quite liberal with what it will accept. I 
started by just typing in the grammar in the Ada reference manual and, 
aside from the left recursive loops, ANTRL v4 accepted it as is. (I'm 
not naive enough to believe that's good enough... so don't worry about 
that).

The cool thing about ANTLR v4 is that it separates the actions from the 
grammar itself so one can use the same grammar to generate parsers in 
multiple languages. At least that's the idea. It remains to be seen how 
well it will actually work for Ada.

Peter

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-04-30 11:55         ` Peter C. Chapin
@ 2013-04-30 13:14           ` john
  2013-04-30 14:14             ` Dmitry A. Kazakov
  2013-05-01 11:33             ` Peter C. Chapin
  0 siblings, 2 replies; 24+ messages in thread
From: john @ 2013-04-30 13:14 UTC (permalink / raw)



> I've started toying around with building an Ada 2012 parser using the 
> 
> ANTLR v4 parser generator tool. I haven't gotten very far with this yet 
> 
> so it's too early to draw any conclusions but ANTLR is quite powerful so 
> 
> I'm hoping the process will be fairly enjoyable... educational at least :)

For a small home-brewn language based on a virtual machine written in Ada I am currently investigating whether I should use a parser generator that can generate Ada code or write the tokenizer/parser by hand. I'm currently tending towards OpenToken but also looking for alternatives, so I was wondering: 

Have you checked ANTLR's Ada backend? It seems to have been lying dormant for a long time. How usable is it?



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-04-30 13:14           ` john
@ 2013-04-30 14:14             ` Dmitry A. Kazakov
  2013-05-01 11:33             ` Peter C. Chapin
  1 sibling, 0 replies; 24+ messages in thread
From: Dmitry A. Kazakov @ 2013-04-30 14:14 UTC (permalink / raw)

On Tue, 30 Apr 2013 06:14:13 -0700 (PDT), john@peppermind.com wrote:

> For a small home-brewn language based on a virtual machine written in Ada
> I am currently investigating whether I should use a parser generator that
> can generate Ada code or write the tokenizer/parser by hand. I'm currently
> tending towards OpenToken but also looking for alternatives,

An alternative to tokenizers and generators is table-driven parsers. E.g. 

http://www.dmitry-kazakov.de/ada/components.htm#Parsers_etc

Advantages of table-driven parsers are simplicity, speed, good error
messages, better software architecture. When the language is simple a
table-driven parser can directly execute (interpret) code without
generating any intermediate code, which is about a half of all cases I
dealt with.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-04-30  1:19   ` Yannick Duchêne (Hibou57)
  2013-04-30  2:03     ` John B. Matthews
@ 2013-04-30 16:06     ` Shark8
  2013-04-30 17:15       ` Yannick Duchêne (Hibou57)
  2013-04-30 21:18       ` Dmitry A. Kazakov
  1 sibling, 2 replies; 24+ messages in thread
From: Shark8 @ 2013-04-30 16:06 UTC (permalink / raw)


On Monday, April 29, 2013 7:19:26 PM UTC-6, Hibou57 (Yannick Duchêne) wrote:
> Le Sun, 28 Apr 2013 16:43:06 +0200, Dmitry A. Kazakov  
> 
> > On Sun, 28 Apr 2013 08:37:33 -0500, Stephen Leake wrote:
> >
> >> Any other ways to handle this problem?
> >
> > Recursive decent parser.
> 
> Or return to the original ascent parser:

Had to look that one up -- then got sidetracked, as there' s a lot of interesting stuff out there.

http://en.wikipedia.org/wiki/GLR_parser
http://link.springer.com/chapter/10.1007%2F3-540-53669-8_70 -- hybrid ascent-descent

It's an interesting field, IMO.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-04-30 16:06     ` Shark8
@ 2013-04-30 17:15       ` Yannick Duchêne (Hibou57)
  2013-04-30 17:51         ` Shark8
  2013-05-01 12:31         ` Stephen Leake
  2013-04-30 21:18       ` Dmitry A. Kazakov
  1 sibling, 2 replies; 24+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2013-04-30 17:15 UTC (permalink / raw)


Le Tue, 30 Apr 2013 18:06:53 +0200, Shark8 <onewingedshark@gmail.com> a  
écrit:

> hybrid  ascent-descent

That's a part of what I have in mind too :p But won't read the paper you  
mention, because I don't want to be influenced by others ideas on that  
topic, to avoid to unconsciously repeat what others already did.

Since some time I'm thinking again I should try to do one applying the  
idea I have in mind. The hybrid ascent-descent is indeed an option, while  
I believe the ascent is the most natural. I think the parser should read  
as the humans read, and humans, the most of the time, start looking at the  
tiny parts (the bottom) and goes bottom‑up, and just occasionally go  
top‑down, and so just when there is a clear marker for this, like a  
heading title in a text as an example and any other marker which only  
appears at the top of a construct.


-- 
“Syntactic sugar causes cancer of the semi-colons.” [1]
“Structured Programming supports the law of the excluded muddle.” [1]
[1]: Epigrams on Programming — Alan J. — P. Yale University



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-04-30 17:15       ` Yannick Duchêne (Hibou57)
@ 2013-04-30 17:51         ` Shark8
  2013-04-30 18:52           ` Yannick Duchêne (Hibou57)
  2013-05-01 12:31         ` Stephen Leake
  1 sibling, 1 reply; 24+ messages in thread
From: Shark8 @ 2013-04-30 17:51 UTC (permalink / raw)


On Tuesday, April 30, 2013 11:15:00 AM UTC-6, Hibou57 (Yannick Duchêne) wrote:
> Le Tue, 30 Apr 2013 18:06:53 +0200, Shark8 a  
> 
> écrit:
> > hybrid  ascent-descent
> 
> That's a part of what I have in mind too :p But won't read the paper you  
> mention, because I don't want to be influenced by others ideas on that  
> topic, to avoid to unconsciously repeat what others already did.

:)
Yep, I haven't read it either but it looks quite interesting. And I can totally respect not wanting to copy others -- that was a main peeve of mine when in college, mentioning I was working on an OS and then being told "just get the Linux source."

> Since some time I'm thinking again I should try to do one applying the  
> idea I have in mind. The hybrid ascent-descent is indeed an option, while  
> I believe the ascent is the most natural.

I'm not sure about that; recursive decent is really natural for anyone who's done recursion -- See: http://compilers.iecc.com/crenshaw/

> I think the parser should read  
> as the humans read, and humans, the most of the time, start looking at the  
> tiny parts (the bottom) and goes bottom-up, and just occasionally go  
> top-down, and so just when there is a clear marker for this, like a  
> heading title in a text as an example and any other marker which only  
> appears at the top of a construct.

Really, I thought they read in a more top-down manner, with perhaps things front-loaded [i.e. forward-declared] as in sections and a TOC.

But that's why it's such an interesting field: there's a lot of ways to do it and a lot of them have different strengths and weaknesses.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-04-30 17:51         ` Shark8
@ 2013-04-30 18:52           ` Yannick Duchêne (Hibou57)
  0 siblings, 0 replies; 24+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2013-04-30 18:52 UTC (permalink / raw)


Le Tue, 30 Apr 2013 19:51:13 +0200, Shark8 <onewingedshark@gmail.com> a  
écrit:
>> I think the parser should read
>> as the humans read, and humans, the most of the time, start looking at  
>> the
>> tiny parts (the bottom) and goes bottom-up, and just occasionally go
>> top-down, and so just when there is a clear marker for this, like a
>> heading title in a text as an example and any other marker which only
>> appears at the top of a construct.
>
> Really, I thought they read in a more top-down manner, with perhaps  
> things front-loaded [i.e. forward-declared] as in sections and a TOC.

To be more precise I would say the reader go top‑down when he/she know  
what he/she is looking for (top‑down for seeking), and bottom‑up when  
he/she do not see the big lines and try to figure what he/she is reading  
(bottom‑up for figuring); as the latter is the case of each new things,  
which in turn is the case when parsing something, I am favouring bottom‑up.

That's oversimplified, I know, I just tried to tell my average though.

I have another reason for my preference for bottom‑up, which is for better  
error “recovering”. For that matter, I agree with GNAT's authors wording.

-- 
“Syntactic sugar causes cancer of the semi-colons.” [1]
“Structured Programming supports the law of the excluded muddle.” [1]
[1]: Epigrams on Programming — Alan J. — P. Yale University



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-04-30 16:06     ` Shark8
  2013-04-30 17:15       ` Yannick Duchêne (Hibou57)
@ 2013-04-30 21:18       ` Dmitry A. Kazakov
  2013-04-30 22:09         ` Shark8
  1 sibling, 1 reply; 24+ messages in thread
From: Dmitry A. Kazakov @ 2013-04-30 21:18 UTC (permalink / raw)

On Tue, 30 Apr 2013 09:06:53 -0700 (PDT), Shark8 wrote:

> Had to look that one up -- then got sidetracked, as there' s a lot of interesting stuff out there.
> 
> http://en.wikipedia.org/wiki/GLR_parser
> http://link.springer.com/chapter/10.1007%2F3-540-53669-8_70 -- hybrid ascent-descent
> 
> It's an interesting field, IMO.

The way humans parse texts and consequently grammars of programming are
uncorrelated to the classes of formal grammars. This is why formal grammars
proved to be quite useless for compiler construction. Practically there is
no language for which recursive descent parser would not work, provided
that literals, identifiers and expressions are parsed by other means.
Specifically, parsing Ada imposes no problems at all.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-04-30 21:18       ` Dmitry A. Kazakov
@ 2013-04-30 22:09         ` Shark8
  0 siblings, 0 replies; 24+ messages in thread
From: Shark8 @ 2013-04-30 22:09 UTC (permalink / raw)

On Tuesday, April 30, 2013 3:18:56 PM UTC-6, Dmitry A. Kazakov wrote:
> On Tue, 30 Apr 2013 09:06:53 -0700 (PDT), Shark8 wrote:
> 
> 
> The way humans parse texts and consequently grammars of programming are
> uncorrelated to the classes of formal grammars.

That's part of why they're interesting.

> This is why formal grammars proved to be quite useless for compiler construction.

They're not the only thing that could be defined to be parsed w/ a grammar though -- a configuration formulation, or a DB-dump, or as part of a distributed system's serialize/deserialize specification could all benefit from being defined in a formal grammar (as opposed to an ad-hoc/"grown" method).

> Practically there is no language for which recursive descent parser would not work, provided that literals, identifiers and expressions are parsed by other means.

Perhaps; but this doesn't mean that recursive-decent is the best in every situation. After all, just because there is no collection of sortable items that can't be sorted via Bubblesort doesn't mean that that's the suitable method for the problem.

> Specifically, parsing Ada imposes no problems at all.

*nod* -- I didn't think there was. I was talking more on the general subject.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-04-30 13:14           ` john
  2013-04-30 14:14             ` Dmitry A. Kazakov
@ 2013-05-01 11:33             ` Peter C. Chapin
  1 sibling, 0 replies; 24+ messages in thread
From: Peter C. Chapin @ 2013-05-01 11:33 UTC (permalink / raw)



On 04/30/2013 09:14 AM, john@peppermind.com wrote:

> Have you checked ANTLR's Ada backend? It seems to have been lying dormant for a long time. How usable is it?


I don't intend to use it.

In fact ANTLR v4 only generates Java at the moment (well, when I checked 
a few weeks ago). They say support for additional languages is coming 
but Ada isn't in the list so using ANTLR v4 to generate Ada would 
require adding that support as well.

The parser I'm playing with will use the Java generated code with Scala.

Peter



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-04-30 17:15       ` Yannick Duchêne (Hibou57)
  2013-04-30 17:51         ` Shark8
@ 2013-05-01 12:31         ` Stephen Leake
  2013-05-01 13:57           ` Shark8
  1 sibling, 1 reply; 24+ messages in thread
From: Stephen Leake @ 2013-05-01 12:31 UTC (permalink / raw)


"Yannick Duchêne (Hibou57)" <yannick_duchene@yahoo.fr> writes:

> Le Tue, 30 Apr 2013 18:06:53 +0200, Shark8 <onewingedshark@gmail.com>
> a écrit:
>
>> hybrid  ascent-descent
>
> That's a part of what I have in mind too :p But won't read the paper
> you mention, because I don't want to be influenced by others ideas on
> that topic, to avoid to unconsciously repeat what others already did.

Those who refuse to learn from history are doomed to repeat it.

Newton admitted he stood on the shoulders of giants.

It is mere hubris that you think you can do better totally on your own!

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-05-01 12:31         ` Stephen Leake
@ 2013-05-01 13:57           ` Shark8
  0 siblings, 0 replies; 24+ messages in thread
From: Shark8 @ 2013-05-01 13:57 UTC (permalink / raw)

On Wednesday, May 1, 2013 6:31:00 AM UTC-6, Stephen Leake wrote:
> 
> It is mere hubris that you think you can do better totally on your own!

He hasn't said anything about thinking he'll do better; just that he's doing something similar.  One of the things about building off of the work of others is that sometimes you miss out on the deep understanding -- another is that something that's "ok"/"good-enough" can stall out research in a particular area for a generation (this happened with PROLOG and the WAM).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-04-28 13:37 LALR parser question Stephen Leake
  2013-04-28 14:43 ` Dmitry A. Kazakov
@ 2013-05-02  1:49 ` Randy Brukardt
  2013-05-02  2:39   ` Yannick Duchêne (Hibou57)
  2013-05-03  9:45   ` Stephen Leake
  1 sibling, 2 replies; 24+ messages in thread
From: Randy Brukardt @ 2013-05-02  1:49 UTC (permalink / raw)


"Stephen Leake" <stephen_leake@stephe-leake.org> wrote in message 
news:85sj2aydwi.fsf@stephe-leake.org...
> As part of Emacs Ada mode 5.0, I'm building a generalized LALR grammar
> for Ada (see
> http://stephe-leake.org/emacs/ada-mode/emacs-ada-mode.html#ada-mode-5.0 )
>
> I'm having trouble with empty declarations. For example (using Bison
> syntax for the grammar), a simplified subset of the Ada package_body sytax 
> is:
>
> package_body
>  : IS declaration_list BEGIN SEMICOLON
>  ;
>
> declaration_list
>  : declaration
>  | declaration_list declaration
>  ;
>
> declaration
>  : object_declaration
>  | subprogram_declaration
>  ;; ...
>  ;
>
> This does not allow an empty declaration_list. But Ada does, so the
> question is how can we add that to the grammar.
>
> There are three choices:
>
> 1) Add an empty declaration choice to declaration_list:
>
> declaration_list
>  : ;; empty list
>  | declaration
>  | declaration_list declaration
>  ;
>
> This is now redundant; since declaration_list can be empty, the second
> choice is not needed:
>
> declaration_list
>  : ;; empty list
>  | declaration_list declaration
>  ;
>
...
> OpenToken cannot handle choice 1; every occurance of declaration_list
> appears to be empty, giving parse errors at parse time. For example, on
> this input:
>
> is begin;
>
> gives a syntax error:
> shift_conflict_bug.input:2:4: Syntax error; expecting 'EOF_ID' or
> 'IS_ID'; found BEGIN_ID 'begin'
>
> I'm not clear if this is expected because of the way LALR works, or if
> this is a bug somewhere in OpenToken (either the grammar generator or
> the parser); any clues?

Choice 1 here is the best way to do this for an LALR(1) grammar. The 
equivalent part of the Janus/Ada grammer reads:

package_body ::= package_head declarative_part block_option ## 8

[Note: We have some extra productions in order to add "hooks" into the 
parsing for doing semantic analysis. Thus "block_option" here. And some of 
the names are weird as this started out as an Ada-80 grammar and has been 
modified periodically ever since. I'm afraid to touch it since so much of 
the compiler directly depends upon it.]

declarative_part ::= dec_option ## 0
dec_option ::= pragma_option
    | dec_option declaration pragma_option
    | dec_option body_dec pragma_option ## 0

pragma_option ::= pragma_option pragma_stmt
    | ## 0                                             -- Empty production

"declaration" and "body_dec" include the usual suspects, and no empty parts.

This generates a grammar with no conflicts.

> Any other ways to handle this problem?

Get a better parser generator?? If the University-of-Wisconsin research 
project derived generator that we use can handle the job, pretty much 
anything properly implemented can. (The original purpose of the generator we 
use was research into table-driven error correction; we're rewritten it 
several times and translated it to Ada so little of the original remains.)

Someone quoted the GNAT documentation as saying:

> Performance. The GNAT parser is as fast as any Ada
> table driven parser, and arguably faster than a
> LALR parser.

I'm dubious. Our parser is extremely small, and thus tends to get into the 
cache on modern machines. This makes it execute extremely fast.

I tend to agree that the speed of parsing doesn't matter that much, but you 
would be surprised at how slow some compilers were at parsing and syntax 
analysis, especially before GNAT came out. We actually had a pretty decent 
business from companies that bought our fast compiler just to do 
pre-compiles on before using their main development slug to build the actual 
system.

We in fact tried early on to abandon the table-driver LALR(1) parser, and 
discovered that a recursive descent parser would simply be too large for the 
48K (yes, that's K) that we had available for the compiler on CP/M. You get 
large bunch of subprogram calls and stack use, none of which happens with 
the LALR(1) table.

I'll admit that this code is so hot that I converted the Ada version into 
hand-coded assembler; every clock cycle that I could shave out of it was 
noticable in the build of a large system. (Perhaps that wouldn't be true 
today, I haven't touched that code since the mid-90s; the assembler code 
itself predates our version control system so I don't know when I originally 
wrote it except that it would have been before 1987.)

                                            Randy.





^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-05-02  1:49 ` Randy Brukardt
@ 2013-05-02  2:39   ` Yannick Duchêne (Hibou57)
  2013-05-02 21:57     ` Randy Brukardt
  2013-05-06 18:25     ` Oliver Kellogg
  2013-05-03  9:45   ` Stephen Leake
  1 sibling, 2 replies; 24+ messages in thread
From: Yannick Duchêne (Hibou57) @ 2013-05-02  2:39 UTC (permalink / raw)


Le Thu, 02 May 2013 03:49:53 +0200, Randy Brukardt <randy@rrsoftware.com>  
a écrit:
> I tend to agree that the speed of parsing doesn't matter that much, but  
> you
> would be surprised at how slow some compilers were at parsing and syntax
> analysis, especially before GNAT came out. We actually had a pretty  
> decent
> business from companies that bought our fast compiler just to do
> pre-compiles on before using their main development slug to build the  
> actual
> system.

More precisely, is there an average picture? What is considered fast and  
what is considered slow? Which amount of sample Ada sources parsed in  
which amount of time? How slow was their “main development slug”? How fast  
is Janus Ada compared to theirs?


-- 
“Syntactic sugar causes cancer of the semi-colons.” [1]
“Structured Programming supports the law of the excluded muddle.” [1]
[1]: Epigrams on Programming — Alan J. — P. Yale University


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-05-02  2:39   ` Yannick Duchêne (Hibou57)
@ 2013-05-02 21:57     ` Randy Brukardt
  2013-05-06 18:25     ` Oliver Kellogg
  1 sibling, 0 replies; 24+ messages in thread
From: Randy Brukardt @ 2013-05-02 21:57 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1884 bytes --]

"Yannick Duchêne (Hibou57)" <yannick_duchene@yahoo.fr> wrote in message 
news:op.wwfqzzi0ule2fv@cardamome...
Le Thu, 02 May 2013 03:49:53 +0200, Randy Brukardt <randy@rrsoftware.com>
a écrit:
>> I tend to agree that the speed of parsing doesn't matter that much, but 
>> you
>> would be surprised at how slow some compilers were at parsing and syntax
>> analysis, especially before GNAT came out. We actually had a pretty 
>> decent
>> business from companies that bought our fast compiler just to do
>> pre-compiles on before using their main development slug to build the 
>> actual
>> system.
>
>More precisely, is there an average picture? What is considered fast and 
>what is considered slow? Which amount of sample Ada sources parsed in 
>which amount of time? How slow was their "main development slug"? How fast 
>is Janus Ada compared to theirs?

I couldn't answer these questions numerically; we didn't own those other 
compilers that were too slow. And it was a long time ago. I'd heard that in 
some cases it was a factor of 10. But the numbers are hardly relevant today. 
Obviously, the developers thought that they saved enough time doing 
check-out compilations using Janus/Ada to justify paying for a license, 
which was not free. So that suggests that there the main system was pretty 
slow (probably on the order of hours).

After all, in those days it took 4 or so hours to do a full recompile of 
Janus/Ada. (It's under 5 minutes today.) We spent a lot of effort trying not 
to make changes to the symboltable in order to avoid those long compiles, 
which lead to all kinds of misuses of existing components. It's always a 
pain when one of those turns up (these days, I change them to have 
meaningful names and fewer tricks, of course, although its not always worth 
the effort to change).

                                                           Randy. 




^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-05-02  1:49 ` Randy Brukardt
  2013-05-02  2:39   ` Yannick Duchêne (Hibou57)
@ 2013-05-03  9:45   ` Stephen Leake
  2013-05-03 22:57     ` Randy Brukardt
  2013-05-06  9:45     ` Stephen Leake
  1 sibling, 2 replies; 24+ messages in thread
From: Stephen Leake @ 2013-05-03  9:45 UTC (permalink / raw)


"Randy Brukardt" <randy@rrsoftware.com> writes:

> "Stephen Leake" <stephen_leake@stephe-leake.org> wrote in message 
> news:85sj2aydwi.fsf@stephe-leake.org...
>> As part of Emacs Ada mode 5.0, I'm building a generalized LALR grammar
>> for Ada (see
>> http://stephe-leake.org/emacs/ada-mode/emacs-ada-mode.html#ada-mode-5.0 )
>>
>> I'm having trouble with empty declarations. For example (using Bison
>> syntax for the grammar), a simplified subset of the Ada package_body sytax 
>> is:
>>
>> package_body
>>  : IS declaration_list BEGIN SEMICOLON
>>  ;
>>
>> declaration_list
>>  : declaration
>>  | declaration_list declaration
>>  ;
>>
>> declaration
>>  : object_declaration
>>  | subprogram_declaration
>>  ;; ...
>>  ;
>>
>> 1) Add an empty declaration choice to declaration_list:
>>
>> declaration_list
>>  : ;; empty list
>>  | declaration_list declaration
>>  ;
>>
> > 2) Add an empty declaration choice to declaration:
> >
> > declaration
> >   : ;; empty declaration
> >   | object_declaration
> >   | subprogram_declaration
> >   ;; ...
> >   ;


> Choice 1 here is the best way to do this for an LALR(1) grammar. The 
> equivalent part of the Janus/Ada grammer reads:
>
> package_body ::= package_head declarative_part block_option ## 8
>
> declarative_part ::= dec_option ## 0
> dec_option ::= pragma_option
>     | dec_option declaration pragma_option
>     | dec_option body_dec pragma_option ## 0
>
> pragma_option ::= pragma_option pragma_stmt
>     | ## 0                                             -- Empty production
>
> "declaration" and "body_dec" include the usual suspects, and no empty
> parts.

This actually the same as my choice 2; the empty production is not in
the list rule (dec_option), it's in one of the options of that rule
(pragma_option).
 
> This generates a grammar with no conflicts.

I get a conflict with choice 2, but I now believe that's a bug; I think
there's something wrong with the lookahead propagation step.

>> Any other ways to handle this problem?
>
> Get a better parser generator?? 

I'm fixing OpenToken; I'm asking if the OpenToken behavior for choice 1
is a bug. You have not directly addressed that, but thanks for letting
me know someone is using LALR in a commercial product!

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-05-03  9:45   ` Stephen Leake
@ 2013-05-03 22:57     ` Randy Brukardt
  2013-05-06  9:45     ` Stephen Leake
  1 sibling, 0 replies; 24+ messages in thread
From: Randy Brukardt @ 2013-05-03 22:57 UTC (permalink / raw)


"Stephen Leake" <stephen_leake@stephe-leake.org> wrote in message 
news:85zjwc8khm.fsf@stephe-leake.org...
...
> I'm fixing OpenToken; I'm asking if the OpenToken behavior for choice 1
> is a bug. You have not directly addressed that, but thanks for letting
> me know someone is using LALR in a commercial product!

Our grammar goes further than yours as it attempts to insert pragmas at 
appropriate places. For your purposes, you could globally replace 
pragma_option by nothing at all and I believe you would get the same 
results. (I certainly can't see how you would get an additional conflict by 
removing some syntax.) If you did that, you would get a grammar that looked 
very similar to the one you originally had as Choice 1. At least that was my 
thought process.

One thing that is very important if to ensure that the empty production only 
appears in one level of your grammar; if not, you'll end up with a conflict 
because it won't be possible to tell which empty production to reduce. 
That's why we have the strangeness of a "program_option".

                                                     Randy.







^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-05-03  9:45   ` Stephen Leake
  2013-05-03 22:57     ` Randy Brukardt
@ 2013-05-06  9:45     ` Stephen Leake
  1 sibling, 0 replies; 24+ messages in thread
From: Stephen Leake @ 2013-05-06  9:45 UTC (permalink / raw)


Stephen Leake <stephen_leake@stephe-leake.org> writes:

> "Randy Brukardt" <randy@rrsoftware.com> writes:
>
>> "Stephen Leake" <stephen_leake@stephe-leake.org> wrote in message 
>> news:85sj2aydwi.fsf@stephe-leake.org...
>>> As part of Emacs Ada mode 5.0, I'm building a generalized LALR grammar
>>> for Ada (see
>>> http://stephe-leake.org/emacs/ada-mode/emacs-ada-mode.html#ada-mode-5.0 )
>>>
>>> I'm having trouble with empty declarations. For example (using Bison
>>> syntax for the grammar), a simplified subset of the Ada package_body sytax 
>>> is:
>>>
>>> package_body
>>>  : IS declaration_list BEGIN SEMICOLON
>>>  ;
>>>
>>> declaration_list
>>>  : declaration
>>>  | declaration_list declaration
>>>  ;
>>>
>>> declaration
>>>  : object_declaration
>>>  | subprogram_declaration
>>>  ;; ...
>>>  ;
>>>
>>> 1) Add an empty declaration choice to declaration_list:
>>>
>>> declaration_list
>>>  : ;; empty list
>>>  | declaration_list declaration
>>>  ;
>>>
>> > 2) Add an empty declaration choice to declaration:
>> >
>> > declaration
>> >   : ;; empty declaration
>> >   | object_declaration
>> >   | subprogram_declaration
>> >   ;; ...
>> >   ;
>
>
>> Choice 1 here is the best way to do this for an LALR(1) grammar. The 
>> equivalent part of the Janus/Ada grammer reads:
>>
>> package_body ::= package_head declarative_part block_option ## 8
>>
>> declarative_part ::= dec_option ## 0
>> dec_option ::= pragma_option
>>     | dec_option declaration pragma_option
>>     | dec_option body_dec pragma_option ## 0
>>
>> pragma_option ::= pragma_option pragma_stmt
>>     | ## 0                                             -- Empty production
>>
>> "declaration" and "body_dec" include the usual suspects, and no empty
>> parts.
>
> This actually the same as my choice 2; the empty production is not in
> the list rule (dec_option), it's in one of the options of that rule
> (pragma_option).

I misread Randy's grammar the first time; it is actally a fourth choice.
In my terms, it is:

package_body
 : IS declarative_part BEGIN SEMICOLON
 ;

declarative_part
 : ;; empty
 | declaration_list
 ;

declaration_list
 : declaration
 | declaration_list declaration
 ;

OpenToken definitely has a bug in processing this grammar; the empty
production is simply not considered. That leads to incorrect errors at
parse time.

>> This generates a grammar with no conflicts.
>
> I get a conflict with choice 2, but I now believe that's a bug; I think
> there's something wrong with the lookahead propagation step.

My choice 2 is in fact ambiguous, and does have a shift/reduce conflict.
Consider these statements:

1) is <null declaration> begin ;
2) is is <null declaration> begin ; begin ;
3) is <null declaration> is <null declaration> begin ; begin ;

In parsing 3), the second 'is' causes a shift/reduce conflict;
shift to start the nested declaration (as in 2), reduce to the null
declaration. Both are correct according to the grammar.

OpenToken has the same bug in handling this empty production, but it
shows up sooner, because it interferes with the conflict reporting code.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: LALR parser question
  2013-05-02  2:39   ` Yannick Duchêne (Hibou57)
  2013-05-02 21:57     ` Randy Brukardt
@ 2013-05-06 18:25     ` Oliver Kellogg
  1 sibling, 0 replies; 24+ messages in thread
From: Oliver Kellogg @ 2013-05-06 18:25 UTC (permalink / raw)


I remember having spent considerable effort making the ANTLR Ada95 grammar and parser [1] comparable in speed to GNAT in -gnatc mode (syntax check only).
I reached that goal by eliminating the syn preds (syntactic predicates) from the grammar. That bought a speedup of approx. 20 times. However, this was ANTLR v2 , and I haven't checked again with later ANTLR versions.

-- Oliver

[1] http://www.antlr3.org/grammar/ada


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2013-05-06 18:25 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-28 13:37 LALR parser question Stephen Leake
2013-04-28 14:43 ` Dmitry A. Kazakov
2013-04-30  1:19   ` Yannick Duchêne (Hibou57)
2013-04-30  2:03     ` John B. Matthews
2013-04-30  4:11       ` Yannick Duchêne (Hibou57)
2013-04-30 11:55         ` Peter C. Chapin
2013-04-30 13:14           ` john
2013-04-30 14:14             ` Dmitry A. Kazakov
2013-05-01 11:33             ` Peter C. Chapin
2013-04-30 16:06     ` Shark8
2013-04-30 17:15       ` Yannick Duchêne (Hibou57)
2013-04-30 17:51         ` Shark8
2013-04-30 18:52           ` Yannick Duchêne (Hibou57)
2013-05-01 12:31         ` Stephen Leake
2013-05-01 13:57           ` Shark8
2013-04-30 21:18       ` Dmitry A. Kazakov
2013-04-30 22:09         ` Shark8
2013-05-02  1:49 ` Randy Brukardt
2013-05-02  2:39   ` Yannick Duchêne (Hibou57)
2013-05-02 21:57     ` Randy Brukardt
2013-05-06 18:25     ` Oliver Kellogg
2013-05-03  9:45   ` Stephen Leake
2013-05-03 22:57     ` Randy Brukardt
2013-05-06  9:45     ` Stephen Leake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox