From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Thread: 103376,1b6a1fe7038b5b8e
X-Google-NewGroupId: yes
X-Google-Attributes: gida07f3367d7,domainid0,public,usenet
X-Google-Language: ENGLISH,UTF8
Received: by 10.66.81.74 with SMTP id y10mr1415981pax.17.1349339018998;
        Thu, 04 Oct 2012 01:23:38 -0700 (PDT)
Path: 
 t10ni23609894pbh.0!nntp.google.com!npeer03.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post02.iad.highwinds-media.com!news.flashnewsgroups.com-b7.4zTQh5tI3A!not-for-mail
From: Stephen Leake <stephen_leake@stephe-leake.org>
Newsgroups: comp.lang.ada
Subject: Re: disambiguating 'begin'
References: <85obkl2lq1.fsf@stephe-leake.org>
 	<5b0a709d-1abc-4b86-a9fe-320c228c1d18@googlegroups.com>
Date: Thu, 04 Oct 2012 04:23:35 -0400
Message-ID: <858vbmsl14.fsf@stephe-leake.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (windows-nt)
Cancel-Lock: sha1:omp/49f7jgq5BpbNE39bzJnt3oQ=
MIME-Version: 1.0
X-Complaints-To: abuse@flashnewsgroups.com
Organization: FlashNewsgroups.com
X-Trace: cd61d506d478ae029e66119096
X-Received-Bytes: 2742
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Date: 2012-10-04T04:23:35-04:00
List-Id: <comp.lang.ada>

gautier_niouzes@hotmail.com writes:

> Le mardi 2 octobre 2012 12:48:41 UTC+2, Stephen Leake a écrit :
>
>> The problem is that "begin" is used in two ways: as the _start_ of a
>> block, and as the _divider_ between declarations and statements in a
>> block
>
> "begin" is always the start of a block's statements, and sometimes the
> start of the block itself.

Yes, and that "sometimes" is the problem.

> Is there kind of a grammar
> with SMIE ?

Yes, it's BNF. However, it's a kind of "very dumb" BNF. The core
parser only knows about operator precedence; it forgets all the other
information that is in the BNF.

So the grammar fragment for block statements looks like this:

(identifier ":" "declare-label" declarations "begin-divide" statements "end-other")
("declare-open" declarations "begin-divide" statements "end-other")
("begin-open" statements "end-other")

Note that there are two variants of "declare", and two of "begin" (and
the rest of the grammar has other variants of "end"). That's because
each variant must have different precedence for this to work properly.

That means the lexer must distinguish between the variants. For
"declare", that's not hard; look for a preceding ":" token.

However, for "begin", there is no simple way to distinguish between
them; you have to scan all the way back to the start of the file.

However, I figured out a way to deal with this. I can deliberately start
a parse forward at the beginning of the file. Then when the parser gets
to a "begin", I can examine the parser stack; it will either have a
keyword that must precede "begin-divide", or something else. That lets
me decide which variant it is.

Of course, doing that full file scan every time you hit a "begin" is
painfully slow (I implemented it that way at first, just to see). So I
added a caching mechanism; once I've classified a "begin", it is
remembered, until text in front of it is edited.

-- 
-- Stephe