From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!news.eternal-september.org!mx05.eternal-september.org!feeder.eternal-september.org!nuzba.szn.dk!news.jacob-sparre.dk!munin.jacob-sparre.dk!pnx.dk!.POSTED!not-for-mail From: "Randy Brukardt" Newsgroups: comp.lang.ada Subject: Re: LALR parser question Date: Wed, 1 May 2013 20:49:53 -0500 Organization: Jacob Sparre Andersen Research & Innovation Message-ID: References: <85sj2aydwi.fsf@stephe-leake.org> NNTP-Posting-Host: static-69-95-181-76.mad.choiceone.net X-Trace: munin.nbi.dk 1367459396 26101 69.95.181.76 (2 May 2013 01:49:56 GMT) X-Complaints-To: news@jacob-sparre.dk NNTP-Posting-Date: Thu, 2 May 2013 01:49:56 +0000 (UTC) X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2900.5931 X-RFC2646: Format=Flowed; Original X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 Xref: news.eternal-september.org comp.lang.ada:15281 Date: 2013-05-01T20:49:53-05:00 List-Id: "Stephen Leake" wrote in message news:85sj2aydwi.fsf@stephe-leake.org... > As part of Emacs Ada mode 5.0, I'm building a generalized LALR grammar > for Ada (see > http://stephe-leake.org/emacs/ada-mode/emacs-ada-mode.html#ada-mode-5.0 ) > > I'm having trouble with empty declarations. For example (using Bison > syntax for the grammar), a simplified subset of the Ada package_body sytax > is: > > package_body > : IS declaration_list BEGIN SEMICOLON > ; > > declaration_list > : declaration > | declaration_list declaration > ; > > declaration > : object_declaration > | subprogram_declaration > ;; ... > ; > > This does not allow an empty declaration_list. But Ada does, so the > question is how can we add that to the grammar. > > There are three choices: > > 1) Add an empty declaration choice to declaration_list: > > declaration_list > : ;; empty list > | declaration > | declaration_list declaration > ; > > This is now redundant; since declaration_list can be empty, the second > choice is not needed: > > declaration_list > : ;; empty list > | declaration_list declaration > ; > ... > OpenToken cannot handle choice 1; every occurance of declaration_list > appears to be empty, giving parse errors at parse time. For example, on > this input: > > is begin; > > gives a syntax error: > shift_conflict_bug.input:2:4: Syntax error; expecting 'EOF_ID' or > 'IS_ID'; found BEGIN_ID 'begin' > > I'm not clear if this is expected because of the way LALR works, or if > this is a bug somewhere in OpenToken (either the grammar generator or > the parser); any clues? Choice 1 here is the best way to do this for an LALR(1) grammar. The equivalent part of the Janus/Ada grammer reads: package_body ::= package_head declarative_part block_option ## 8 [Note: We have some extra productions in order to add "hooks" into the parsing for doing semantic analysis. Thus "block_option" here. And some of the names are weird as this started out as an Ada-80 grammar and has been modified periodically ever since. I'm afraid to touch it since so much of the compiler directly depends upon it.] declarative_part ::= dec_option ## 0 dec_option ::= pragma_option | dec_option declaration pragma_option | dec_option body_dec pragma_option ## 0 pragma_option ::= pragma_option pragma_stmt | ## 0 -- Empty production "declaration" and "body_dec" include the usual suspects, and no empty parts. This generates a grammar with no conflicts. > Any other ways to handle this problem? Get a better parser generator?? If the University-of-Wisconsin research project derived generator that we use can handle the job, pretty much anything properly implemented can. (The original purpose of the generator we use was research into table-driven error correction; we're rewritten it several times and translated it to Ada so little of the original remains.) Someone quoted the GNAT documentation as saying: > Performance. The GNAT parser is as fast as any Ada > table driven parser, and arguably faster than a > LALR parser. I'm dubious. Our parser is extremely small, and thus tends to get into the cache on modern machines. This makes it execute extremely fast. I tend to agree that the speed of parsing doesn't matter that much, but you would be surprised at how slow some compilers were at parsing and syntax analysis, especially before GNAT came out. We actually had a pretty decent business from companies that bought our fast compiler just to do pre-compiles on before using their main development slug to build the actual system. We in fact tried early on to abandon the table-driver LALR(1) parser, and discovered that a recursive descent parser would simply be too large for the 48K (yes, that's K) that we had available for the compiler on CP/M. You get large bunch of subprogram calls and stack use, none of which happens with the LALR(1) table. I'll admit that this code is so hot that I converted the Ada version into hand-coded assembler; every clock cycle that I could shave out of it was noticable in the build of a large system. (Perhaps that wouldn't be true today, I haven't touched that code since the mid-90s; the assembler code itself predates our version control system so I don't know when I originally wrote it except that it would have been before 1987.) Randy.