From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Thread: 103376,1b6a1fe7038b5b8e X-Google-NewGroupId: yes X-Google-Attributes: gida07f3367d7,domainid0,public,usenet X-Google-Language: ENGLISH,UTF8 Received: by 10.66.81.74 with SMTP id y10mr1415981pax.17.1349339018998; Thu, 04 Oct 2012 01:23:38 -0700 (PDT) Path: t10ni23609894pbh.0!nntp.google.com!npeer03.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post02.iad.highwinds-media.com!news.flashnewsgroups.com-b7.4zTQh5tI3A!not-for-mail From: Stephen Leake Newsgroups: comp.lang.ada Subject: Re: disambiguating 'begin' References: <85obkl2lq1.fsf@stephe-leake.org> <5b0a709d-1abc-4b86-a9fe-320c228c1d18@googlegroups.com> Date: Thu, 04 Oct 2012 04:23:35 -0400 Message-ID: <858vbmsl14.fsf@stephe-leake.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (windows-nt) Cancel-Lock: sha1:omp/49f7jgq5BpbNE39bzJnt3oQ= MIME-Version: 1.0 X-Complaints-To: abuse@flashnewsgroups.com Organization: FlashNewsgroups.com X-Trace: cd61d506d478ae029e66119096 X-Received-Bytes: 2742 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Date: 2012-10-04T04:23:35-04:00 List-Id: gautier_niouzes@hotmail.com writes: > Le mardi 2 octobre 2012 12:48:41 UTC+2, Stephen Leake a écrit : > >> The problem is that "begin" is used in two ways: as the _start_ of a >> block, and as the _divider_ between declarations and statements in a >> block > > "begin" is always the start of a block's statements, and sometimes the > start of the block itself. Yes, and that "sometimes" is the problem. > Is there kind of a grammar > with SMIE ? Yes, it's BNF. However, it's a kind of "very dumb" BNF. The core parser only knows about operator precedence; it forgets all the other information that is in the BNF. So the grammar fragment for block statements looks like this: (identifier ":" "declare-label" declarations "begin-divide" statements "end-other") ("declare-open" declarations "begin-divide" statements "end-other") ("begin-open" statements "end-other") Note that there are two variants of "declare", and two of "begin" (and the rest of the grammar has other variants of "end"). That's because each variant must have different precedence for this to work properly. That means the lexer must distinguish between the variants. For "declare", that's not hard; look for a preceding ":" token. However, for "begin", there is no simple way to distinguish between them; you have to scan all the way back to the start of the file. However, I figured out a way to deal with this. I can deliberately start a parse forward at the beginning of the file. Then when the parser gets to a "begin", I can examine the parser stack; it will either have a keyword that must precede "begin-divide", or something else. That lets me decide which variant it is. Of course, doing that full file scan every time you hit a "begin" is painfully slow (I implemented it that way at first, just to see). So I added a caching mechanism; once I've classified a "begin", it is remembered, until text in front of it is edited. -- -- Stephe