comp.lang.ada
 help / color / mirror / Atom feed
From: "Randy Brukardt" <randy@rrsoftware.com>
Subject: Re: LALR parser question
Date: Wed, 1 May 2013 20:49:53 -0500
Date: 2013-05-01T20:49:53-05:00	[thread overview]
Message-ID: <klsgo4$pfl$1@munin.nbi.dk> (raw)
In-Reply-To: 85sj2aydwi.fsf@stephe-leake.org

"Stephen Leake" <stephen_leake@stephe-leake.org> wrote in message 
news:85sj2aydwi.fsf@stephe-leake.org...
> As part of Emacs Ada mode 5.0, I'm building a generalized LALR grammar
> for Ada (see
> http://stephe-leake.org/emacs/ada-mode/emacs-ada-mode.html#ada-mode-5.0 )
>
> I'm having trouble with empty declarations. For example (using Bison
> syntax for the grammar), a simplified subset of the Ada package_body sytax 
> is:
>
> package_body
>  : IS declaration_list BEGIN SEMICOLON
>  ;
>
> declaration_list
>  : declaration
>  | declaration_list declaration
>  ;
>
> declaration
>  : object_declaration
>  | subprogram_declaration
>  ;; ...
>  ;
>
> This does not allow an empty declaration_list. But Ada does, so the
> question is how can we add that to the grammar.
>
> There are three choices:
>
> 1) Add an empty declaration choice to declaration_list:
>
> declaration_list
>  : ;; empty list
>  | declaration
>  | declaration_list declaration
>  ;
>
> This is now redundant; since declaration_list can be empty, the second
> choice is not needed:
>
> declaration_list
>  : ;; empty list
>  | declaration_list declaration
>  ;
>
...
> OpenToken cannot handle choice 1; every occurance of declaration_list
> appears to be empty, giving parse errors at parse time. For example, on
> this input:
>
> is begin;
>
> gives a syntax error:
> shift_conflict_bug.input:2:4: Syntax error; expecting 'EOF_ID' or
> 'IS_ID'; found BEGIN_ID 'begin'
>
> I'm not clear if this is expected because of the way LALR works, or if
> this is a bug somewhere in OpenToken (either the grammar generator or
> the parser); any clues?

Choice 1 here is the best way to do this for an LALR(1) grammar. The 
equivalent part of the Janus/Ada grammer reads:

package_body ::= package_head declarative_part block_option ## 8

[Note: We have some extra productions in order to add "hooks" into the 
parsing for doing semantic analysis. Thus "block_option" here. And some of 
the names are weird as this started out as an Ada-80 grammar and has been 
modified periodically ever since. I'm afraid to touch it since so much of 
the compiler directly depends upon it.]

declarative_part ::= dec_option ## 0
dec_option ::= pragma_option
    | dec_option declaration pragma_option
    | dec_option body_dec pragma_option ## 0

pragma_option ::= pragma_option pragma_stmt
    | ## 0                                             -- Empty production

"declaration" and "body_dec" include the usual suspects, and no empty parts.

This generates a grammar with no conflicts.

> Any other ways to handle this problem?

Get a better parser generator?? If the University-of-Wisconsin research 
project derived generator that we use can handle the job, pretty much 
anything properly implemented can. (The original purpose of the generator we 
use was research into table-driven error correction; we're rewritten it 
several times and translated it to Ada so little of the original remains.)

Someone quoted the GNAT documentation as saying:

> Performance. The GNAT parser is as fast as any Ada
> table driven parser, and arguably faster than a
> LALR parser.

I'm dubious. Our parser is extremely small, and thus tends to get into the 
cache on modern machines. This makes it execute extremely fast.

I tend to agree that the speed of parsing doesn't matter that much, but you 
would be surprised at how slow some compilers were at parsing and syntax 
analysis, especially before GNAT came out. We actually had a pretty decent 
business from companies that bought our fast compiler just to do 
pre-compiles on before using their main development slug to build the actual 
system.

We in fact tried early on to abandon the table-driver LALR(1) parser, and 
discovered that a recursive descent parser would simply be too large for the 
48K (yes, that's K) that we had available for the compiler on CP/M. You get 
large bunch of subprogram calls and stack use, none of which happens with 
the LALR(1) table.

I'll admit that this code is so hot that I converted the Ada version into 
hand-coded assembler; every clock cycle that I could shave out of it was 
noticable in the build of a large system. (Perhaps that wouldn't be true 
today, I haven't touched that code since the mid-90s; the assembler code 
itself predates our version control system so I don't know when I originally 
wrote it except that it would have been before 1987.)

                                            Randy.





  parent reply	other threads:[~2013-05-02  1:49 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-28 13:37 LALR parser question Stephen Leake
2013-04-28 14:43 ` Dmitry A. Kazakov
2013-04-30  1:19   ` Yannick Duchêne (Hibou57)
2013-04-30  2:03     ` John B. Matthews
2013-04-30  4:11       ` Yannick Duchêne (Hibou57)
2013-04-30 11:55         ` Peter C. Chapin
2013-04-30 13:14           ` john
2013-04-30 14:14             ` Dmitry A. Kazakov
2013-05-01 11:33             ` Peter C. Chapin
2013-04-30 16:06     ` Shark8
2013-04-30 17:15       ` Yannick Duchêne (Hibou57)
2013-04-30 17:51         ` Shark8
2013-04-30 18:52           ` Yannick Duchêne (Hibou57)
2013-05-01 12:31         ` Stephen Leake
2013-05-01 13:57           ` Shark8
2013-04-30 21:18       ` Dmitry A. Kazakov
2013-04-30 22:09         ` Shark8
2013-05-02  1:49 ` Randy Brukardt [this message]
2013-05-02  2:39   ` Yannick Duchêne (Hibou57)
2013-05-02 21:57     ` Randy Brukardt
2013-05-06 18:25     ` Oliver Kellogg
2013-05-03  9:45   ` Stephen Leake
2013-05-03 22:57     ` Randy Brukardt
2013-05-06  9:45     ` Stephen Leake
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox