disambiguating 'begin'

comp.lang.ada
 help / color / mirror / Atom feed

* disambiguating 'begin'
@ 2012-10-02 10:48 Stephen Leake
  2012-10-02 11:33 ` gautier_niouzes
       [not found] ` <5b0a709d-1abc-4b86-a9fe-320c228c1d18@googlegroups.com>
  0 siblings, 2 replies; 3+ messages in thread
From: Stephen Leake @ 2012-10-02 10:48 UTC (permalink / raw)


I've hit a major snag in the new Emacs Ada mode indentation engine. I'm
posting here in hopes of sympathy and good ideas :).

I'm using Emacs SMIE (Simple Minded Indentation Engine), which provides
facilities for implementing an operator precedence grammar
(http://en.wikipedia.org/wiki/Operator-precedence_parser - gota love
Wikipedia! more thorough description at
http://dickgrune.com/Books/PTAPG_2nd_Edition/, or the dragon book
section 4.6)

The main rationale for this kind of parser is that it works equally well
backwards as forwards, as long as tokens are unique, or can be made so
by only looking at local text. That's useful for an indentation engine;
you can figure out the indention by looking back in the text a short
way.

However, it turns out "begin" in Ada cannot be made unique in this
sense!

The problem is that "begin" is used in two ways: as the _start_ of a
block, and as the _divider_ between declarations and statements in a
block:

function F1 is
  <declarations>
begin -- divider
  <statements>
  begin -- block start
    <statements>
  end;
end;

In the operator precedence grammar, these two uses of begin must be
unique; they must be given separate keyword names.

However, as far as I can see, the only way to figure out which role
"begin" is playing is to parse from the start of the compilation unit.
Consider a package body:

package body Pack_1 is

  <declarations>

  function F1 is
    <declarations>
  begin -- divider
    <statements>
    begin -- block start
      <statements>
    end;
    <statements>
    begin -- block start
      <statements>
    end;
  end;

  begin -- divider
    <statements>
  end;

Here I've deliberately got the indentation wrong at the end, to
emphasize the ambiguity (that's how my latest indentation code indents
this :( ).

If we just look back a few keywords from each "begin", we can't tell
which role it is playing. In particular, the package "begin" just looks
like it follows a bunch of statements/declarations (SMIE can't tell the
difference between a statement and a declaration). We must go all the
way back to "package".

When all the tokens are properly disambiguated, SMIE can traverse
correctly from package "begin" to "package". But we can't do that while
disambiguating "begin"; that's circular (been there, done that :).

The current Emacs Ada mode does this in a totally ad-hoc way, and I'm
pretty sure that introducing if-expressions will break it (they do break
something in the current indentation engine).

I believe the indentation engine in GPS always starts at the beginning
of the edit buffer, and scans forward to the current editing point,
keeping track of things. (I found the scanner code in the GPS source,
but not the code that calls it, so I'm not certain what string is passed
in).

Emacs has a facility for doing that (semantic), so I can give that a
try. But it's a lot more work, it's apparently not intended to be used
this way (it typically runs in the background), and I was making such
good progress with SMIE!

Any ideas?

(I did briefly consider requesting the ARG to make these two uses
separate keywords; I'm desperate :).

--
-- Stephe



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: disambiguating 'begin'
  2012-10-02 10:48 disambiguating 'begin' Stephen Leake
@ 2012-10-02 11:33 ` gautier_niouzes
       [not found] ` <5b0a709d-1abc-4b86-a9fe-320c228c1d18@googlegroups.com>
  1 sibling, 0 replies; 3+ messages in thread
From: gautier_niouzes @ 2012-10-02 11:33 UTC (permalink / raw)


Le mardi 2 octobre 2012 12:48:41 UTC+2, Stephen Leake a écrit :

> The problem is that "begin" is used in two ways: as the _start_ of a
> block, and as the _divider_ between declarations and statements in a
> block

I'm afraid you are seeing things more complicated than they are - or is SMIE perhaps forcing you to do so ?
"begin" is always the start of a block's statements, and sometimes the start of the block itself.
At level 0 it should appear at the same indentation column as "function"; at level 1 or more, at the same indentation column as other sibling statements.
And don't forget the more general "declare..begin..exception..end;" form!...
Is there kind of a grammar with SMIE ?
______________________________________________________________________________
Gautier's Ada programming -- http://gautiersblog.blogspot.com/search/label/Ada 
NB: follow the above link for a valid e-mail address



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: disambiguating 'begin'
       [not found] ` <5b0a709d-1abc-4b86-a9fe-320c228c1d18@googlegroups.com>
@ 2012-10-04  8:23   ` Stephen Leake
  0 siblings, 0 replies; 3+ messages in thread
From: Stephen Leake @ 2012-10-04  8:23 UTC (permalink / raw)

gautier_niouzes@hotmail.com writes:

> Le mardi 2 octobre 2012 12:48:41 UTC+2, Stephen Leake a écrit :
>
>> The problem is that "begin" is used in two ways: as the _start_ of a
>> block, and as the _divider_ between declarations and statements in a
>> block
>
> "begin" is always the start of a block's statements, and sometimes the
> start of the block itself.

Yes, and that "sometimes" is the problem.

> Is there kind of a grammar
> with SMIE ?

Yes, it's BNF. However, it's a kind of "very dumb" BNF. The core
parser only knows about operator precedence; it forgets all the other
information that is in the BNF.

So the grammar fragment for block statements looks like this:

(identifier ":" "declare-label" declarations "begin-divide" statements "end-other")
("declare-open" declarations "begin-divide" statements "end-other")
("begin-open" statements "end-other")

Note that there are two variants of "declare", and two of "begin" (and
the rest of the grammar has other variants of "end"). That's because
each variant must have different precedence for this to work properly.

That means the lexer must distinguish between the variants. For
"declare", that's not hard; look for a preceding ":" token.

However, for "begin", there is no simple way to distinguish between
them; you have to scan all the way back to the start of the file.

However, I figured out a way to deal with this. I can deliberately start
a parse forward at the beginning of the file. Then when the parser gets
to a "begin", I can examine the parser stack; it will either have a
keyword that must precede "begin-divide", or something else. That lets
me decide which variant it is.

Of course, doing that full file scan every time you hit a "begin" is
painfully slow (I implemented it that way at first, just to see). So I
added a caching mechanism; once I've classified a "begin", it is
remembered, until text in front of it is edited.

-- 
-- Stephe

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-10-04  8:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-02 10:48 disambiguating 'begin' Stephen Leake
2012-10-02 11:33 ` gautier_niouzes
     [not found] ` <5b0a709d-1abc-4b86-a9fe-320c228c1d18@googlegroups.com>
2012-10-04  8:23   ` Stephen Leake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox