From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00
	autolearn=unavailable autolearn_force=no version=3.4.4
Path: 
 eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!mx02.eternal-september.org!feeder.eternal-september.org!news.glorb.com!peer02.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post02.iad.highwinds-media.com!news.flashnewsgroups.com-b7.4zTQh5tI3A!not-for-mail
From: Stephen Leake <stephen_leake@stephe-leake.org>
Newsgroups: comp.lang.ada
Subject: Re: OpenToken: Parsing Ada (subset)?
References: <878uc3r2y6.fsf@adaheads.sparre-andersen.dk>
Date: Tue, 02 Jun 2015 17:12:35 -0500
Message-ID: <85twupvjxo.fsf@stephe-leake.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (windows-nt)
Cancel-Lock: sha1:nQjkqFFlv4qQiefQfPA+uQIt3GA=
MIME-Version: 1.0
Content-Type: text/plain
X-Complaints-To: abuse@flashnewsgroups.com
Organization: FlashNewsgroups.com
X-Trace: 83fd6556e2a54e97f808421132
X-Received-Bytes: 5115
X-Received-Body-CRC: 3157748980
Xref: news.eternal-september.org comp.lang.ada:26141
Date: 2015-06-02T17:12:35-05:00
List-Id: <comp.lang.ada>

Jacob Sparre Andersen <sparre@nbi.dk> writes:

> I'm attempting to use OpenToken to parse an Ada subset without much
> success.

(Shameless plug) An alternative approach is to start from the full Ada
grammar in Emacs ada-mode (ada-grammar.wy), and reduce it to what you
want. You'd also have to change the actions; they are focused on
supporting indentation and navigation in Emacs.

That grammar is in a .wy file, which must be processed by wisi-generate
to produce Ada code. The code produced by wisi-generate does not support
Ada actions at the moment (it only supports Emacs elisp, or a process or
dll communicating with Emacs), but you could either add that to
wisi-generate or treat the generated code as a template and add your own
actions. 

wisi-generate also assumes a single token type, so that changes how you
write the actions (more code in the actions, less in the tokens).

If you want to try adding actions to wisi-generate, I suggest you start
from FastToken (in monotone branch org.fasttoken, no release yet). It is
passing all its tests. I am still working on it, so the code will be
changing quite a bit. I plain to completely eliminate the token type
hierarchy, to avoid the run-time dynamic memory management it requires.
But that should be mostly orthogonal to actions. I'd be happy to help
with this; it is on my list of things to do someday, and having an
actual user is fun :).

> Identifier_T          =>
>    Tokenizer.Get (OpenToken.Recognizer.Identifier.Get
>                   (Start_Chars => Ada.Strings.Maps.Constants.Letter_Set,
>                    Body_Chars  => Ada.Strings.Maps.Constants.Alphanumeric_Set)),

This line says to use the default 'Get' routine to specify the token
type to use for Identifier_T; that is Master_Token.Instance, as you
complain later (see opentoken-token-enumerated-analyzer.ads:88). Change
to:

      Identifier_T          =>
         Tokenizer.Get 
            (OpenToken.Recognizer.Identifier.Get
                (Start_Chars => Ada.Strings.Maps.Constants.Letter_Set,
                 Body_Chars  => Ada.Strings.Maps.Constants.Alphanumeric_Set),
                 Identifiers.Get (Identifier_T),

I hate default parameters in general; this one is sort of convenient
(most terminals are happy being Master_Token.Instance, and you don't
have to repeat Identifier_T), but easily leads to this kind of bug (most
default parameters lead to hard to find bugs!).

Note that the ID parameter to Identifiers.Get could be defaulted; it is
overwritten in Analyzer.Initialize.

To see this in the code; terminal token objects are created on the
parser stack at opentoken-production-parser-lalr-parser.adb:318, as a
copy of a token returned by Analyzer.Get. Analyzer.Get returns the token
stored in the syntax; see opentoken-token-enumerated-analyzer.adb:772.

Nonterminal token objects are created at
opentoken-production-parser-lalr-parser.adb:177, as a copy of the
production left hand side (ie Dotted_Identifier below).

> Identifier : constant Identifiers.Instance'Class := Identifiers.Get (Identifier_T);
>
> I declare the grammar as:
>
>    Grammar : constant Production_List.Instance :=
>      Compilation_Unit  <= Dotted_Identifier & EOF and
>      Dotted_Identifier <= Dotted_Identifier & Dot & Identifier + Dotted_Identifiers.Join_Dotted_Identifiers and
>      Dotted_Identifier <= Identifier;

You might think that the use of Identifier.Instance here does what you
want, but it doesn't; the grammar determines the types only of the
nonterminals, the syntax determines the types of the terminals.

I suppose I should add that to the OpenToken user manual, but I'd rather
work on the FastToken user manual, which won't have this problem (only
one type for tokens :).

> I'm probably doing something obvious wrong, but what?

Obvious to me, but I've been messing with the lexer code in FastToken
recently; I switched to using regular expressions for the token
recognizers (to be closer to Aflex, which is also now supported). While
doing that, I deleted the default Get above.

Using regular expressions instead of the old OpenToken recognizers is a
big change, but wisi-generate takes care of most of the drudge work for
you (you just have to specify the actual regular expressions).

Glad to hear someone is using OpenToken, and it's fun to actually talk
about this stuff once in a while :).

-- 
-- Stephe