From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=unavailable autolearn_force=no version=3.4.4 Path: eternal-september.org!reader01.eternal-september.org!reader02.eternal-september.org!news.eternal-september.org!mx02.eternal-september.org!feeder.eternal-september.org!news.glorb.com!peer02.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!post02.iad.highwinds-media.com!news.flashnewsgroups.com-b7.4zTQh5tI3A!not-for-mail From: Stephen Leake Newsgroups: comp.lang.ada Subject: Re: OpenToken: Parsing Ada (subset)? References: <878uc3r2y6.fsf@adaheads.sparre-andersen.dk> Date: Tue, 02 Jun 2015 17:12:35 -0500 Message-ID: <85twupvjxo.fsf@stephe-leake.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (windows-nt) Cancel-Lock: sha1:nQjkqFFlv4qQiefQfPA+uQIt3GA= MIME-Version: 1.0 Content-Type: text/plain X-Complaints-To: abuse@flashnewsgroups.com Organization: FlashNewsgroups.com X-Trace: 83fd6556e2a54e97f808421132 X-Received-Bytes: 5115 X-Received-Body-CRC: 3157748980 Xref: news.eternal-september.org comp.lang.ada:26141 Date: 2015-06-02T17:12:35-05:00 List-Id: Jacob Sparre Andersen writes: > I'm attempting to use OpenToken to parse an Ada subset without much > success. (Shameless plug) An alternative approach is to start from the full Ada grammar in Emacs ada-mode (ada-grammar.wy), and reduce it to what you want. You'd also have to change the actions; they are focused on supporting indentation and navigation in Emacs. That grammar is in a .wy file, which must be processed by wisi-generate to produce Ada code. The code produced by wisi-generate does not support Ada actions at the moment (it only supports Emacs elisp, or a process or dll communicating with Emacs), but you could either add that to wisi-generate or treat the generated code as a template and add your own actions. wisi-generate also assumes a single token type, so that changes how you write the actions (more code in the actions, less in the tokens). If you want to try adding actions to wisi-generate, I suggest you start from FastToken (in monotone branch org.fasttoken, no release yet). It is passing all its tests. I am still working on it, so the code will be changing quite a bit. I plain to completely eliminate the token type hierarchy, to avoid the run-time dynamic memory management it requires. But that should be mostly orthogonal to actions. I'd be happy to help with this; it is on my list of things to do someday, and having an actual user is fun :). > Identifier_T => > Tokenizer.Get (OpenToken.Recognizer.Identifier.Get > (Start_Chars => Ada.Strings.Maps.Constants.Letter_Set, > Body_Chars => Ada.Strings.Maps.Constants.Alphanumeric_Set)), This line says to use the default 'Get' routine to specify the token type to use for Identifier_T; that is Master_Token.Instance, as you complain later (see opentoken-token-enumerated-analyzer.ads:88). Change to: Identifier_T => Tokenizer.Get (OpenToken.Recognizer.Identifier.Get (Start_Chars => Ada.Strings.Maps.Constants.Letter_Set, Body_Chars => Ada.Strings.Maps.Constants.Alphanumeric_Set), Identifiers.Get (Identifier_T), I hate default parameters in general; this one is sort of convenient (most terminals are happy being Master_Token.Instance, and you don't have to repeat Identifier_T), but easily leads to this kind of bug (most default parameters lead to hard to find bugs!). Note that the ID parameter to Identifiers.Get could be defaulted; it is overwritten in Analyzer.Initialize. To see this in the code; terminal token objects are created on the parser stack at opentoken-production-parser-lalr-parser.adb:318, as a copy of a token returned by Analyzer.Get. Analyzer.Get returns the token stored in the syntax; see opentoken-token-enumerated-analyzer.adb:772. Nonterminal token objects are created at opentoken-production-parser-lalr-parser.adb:177, as a copy of the production left hand side (ie Dotted_Identifier below). > Identifier : constant Identifiers.Instance'Class := Identifiers.Get (Identifier_T); > > I declare the grammar as: > > Grammar : constant Production_List.Instance := > Compilation_Unit <= Dotted_Identifier & EOF and > Dotted_Identifier <= Dotted_Identifier & Dot & Identifier + Dotted_Identifiers.Join_Dotted_Identifiers and > Dotted_Identifier <= Identifier; You might think that the use of Identifier.Instance here does what you want, but it doesn't; the grammar determines the types only of the nonterminals, the syntax determines the types of the terminals. I suppose I should add that to the OpenToken user manual, but I'd rather work on the FastToken user manual, which won't have this problem (only one type for tokens :). > I'm probably doing something obvious wrong, but what? Obvious to me, but I've been messing with the lexer code in FastToken recently; I switched to using regular expressions for the token recognizers (to be closer to Aflex, which is also now supported). While doing that, I deleted the default Get above. Using regular expressions instead of the old OpenToken recognizers is a big change, but wisi-generate takes care of most of the drudge work for you (you just have to specify the actual regular expressions). Glad to hear someone is using OpenToken, and it's fun to actually talk about this stuff once in a while :). -- -- Stephe