comp.lang.ada
 help / color / mirror / Atom feed
From: mtrenkmann <martin.trenkmann@googlemail.com>
Subject: Re: OpenToken: Handling the empty word token
Date: Mon, 30 Jan 2012 08:28:57 -0800 (PST)
Date: 2012-01-30T08:28:57-08:00	[thread overview]
Message-ID: <7756ad67-4ce1-4fb8-a4a1-f8136815ac9b@p13g2000yqd.googlegroups.com> (raw)
In-Reply-To: 82k44cayui.fsf@stephe-leake.org

On Jan 28, 11:46 am, Stephen Leake <stephen_le...@stephe-leake.org>
wrote:
> mtrenkmann <martin.trenkm...@googlemail.com> writes:
> > Very often grammars have so called epsilon-productions where one
> > alternative for a non-terminal symbol points to theemptyword
> > (epsilon).
>
> I've never seen this, but I haven't seen many grammars.
>
Here is the ASN.1 grammar I have to deal with (see Annex L):
http://www.itu.int/rec/T-REC-X.680-200811-I/en

> I would code that as:
>
> Settings <= required_settings + action and
> Settings <= required_settings & optional + action
>
> Which I assume you are doing as a workaround for what you want.
>
> Supporting epsilon is an interesting idea. It does reduce the amount of
> user-written code, especially when there is more than one optional part.
>
This is exactly why I need epsilon token support in the RHS of a
production. If you look at the top-level production in ASN.1, namely
"ModuleDefinition", there are already 4 optional non-terminals (the
epsilon token is called "empty" here). Modeling the epsilon token at
that higher level means to provide 2^4 RHS combinations for
"ModuleDefinition". No good!

> > Is there a way to instrument the parser to silently accept the epsilon
> > token whenever it expects it without consuming a token from the lexer,
>
> It's been a while since I messed withOpenTokenat that level, but it
> looks like you could make this happen by changingopentoken-recognizer-nothing.adb Analyze to report Verdict := Matches.
> You might give that a try.
>
> I suspect this would have the problem that Dmitry pointed out; the
> epsilon token would always match, so you'd get lots of ambiguities.
>
I tried it and yes I got lots of ambiguities.

> To get rid of the ambiguities, you would need to feed information about
> what the parser expects into the lexer, so Epsilon would only match if
> the parser expects it. That is possible, but there's no formal mechanism
> for it inOpenToken.
>
> You could define a new token type Epsilon, modeled on Nothing, and add a
> Set_Expected procedure. Then call that from somewhere inopentoken-production-parser-lalr.adb Parse; probably just before each
> call to Find_Next. The list of expected tokens is given by the functionopentoken-production-parser-lalr.adb Expecting (currently used in error
> messages).
>
Sounds feasible to me. I will implement that and let you know.

> -- Stephe

Ok, thanks for all your helpful hints. In the meanwhile I came up with
the following workaround, that actually works fine.
The idea is to perform some kind of recovery in the Parse procedure if
Action.Verb = Error due to the missing of the epsilon token. In that
case I check if epsilon is one of the expected tokens, and if so I set
Current_State.Seen_Token to epsilon and prevent the parser from
reading a new token from the lexer in the next Shift step. Here is the
code with the important parts:

-- The epsilon token becomes a new generic constant of the lexer.
generic
   Last_Terminal : in Token_ID := Token_ID'Last;
   Epsilon_Token : in Token_ID;
package OpenToken.Token.Enumerated.Analyzer is
   -- new function needed by the parser
   function Get_Syntax (Analyzer : in Instance) return Syntax;
   function Epsilon_ID (Analyzer : in Instance) return Terminal_ID;
end OpenToken.Token.Enumerated.Analyzer;

-- This is no complete code! Just the important parts.
package body OpenToken.Production.Parser.LALR is
   overriding procedure Parse (Parser : in out Instance) is
      -- a flag to prevent the parser from reading a new token in
Shift step
      Get_Next_From_Analyzer : Boolean := True;
   begin
      loop
         case Action.Verb is
         when Shift =>
            if Get_Next_From_Analyzer then
               --  Get the next token
               begin
                  Tokenizer.Find_Next (Parser.Analyzer);
               exception
                  when E : Syntax_Error =>
                     raise Syntax_Error with
                     Integer_Image (Line (Parser)) &
                     ":" &
                     Integer_Image (Column (Parser) - 1) &
                     " " &
                     Ada.Exceptions.Exception_Message (E);
               end;
            else
               -- Use the current token again in the next call of
Action_For
               Get_Next_From_Analyzer := True;
            end if;
         when Error =>
            -- BEGIN epsilon token hack
            -- Try to recover if epsilon is one of the expected tokens
            declare
               use Token;
               Expected_Tokens : constant Token_Array
                 := Expecting (Parser.Table, Current_State.State);
               Is_Epsilon_Expected : Boolean := False;
            begin
               for I in Expected_Tokens'Range loop
                  if Expected_Tokens (I) = Tokenizer.Epsilon_Token
then
                     Is_Epsilon_Expected := True;
                     exit;
                  end if;
               end loop;
               if Is_Epsilon_Expected then
                  -- Create an on-demand epsilon token
                  Current_State.Seen_Token := new
Token.Class'(Parser.Analyzer.Get_Syntax
(Parser.Analyzer.Epsilon_ID).Token_Handle.all);
                  Get_Next_From_Analyzer := False;
            -- END epsilon token hack
               else
                  -- Clean up
                  -- Raise Sytax_Error
               end if;
            end;
         end case;
      end loop;
end OpenToken.Production.Parser.LALR;

I am wondering if that behavior cannot be implemented in the table-
lookup procedure. That is when I call Action_For with an unexpected
token, while there is the epsilon as one of the expected tokens, then
this epsilon could be accepted automatically, the parser changes its
state accordingly, and the unexpected token is checked again. Don't
know if that is possible, since I am not yet familiar with the
internals of table-driven parsing.

-- Martin



  reply	other threads:[~2012-01-30 16:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-27 16:22 OpenToken: Handling the empty word token mtrenkmann
2012-01-27 16:48 ` Dmitry A. Kazakov
2012-01-28  3:42   ` Randy Brukardt
2012-01-29 17:45     ` Stephen Leake
2012-01-31  0:56       ` Randy Brukardt
2012-01-31  9:09         ` Georg Bauhaus
2012-01-31 12:16         ` Stephen Leake
2012-02-02  1:39           ` Randy Brukardt
2012-01-28 10:46 ` Stephen Leake
2012-01-30 16:28   ` mtrenkmann [this message]
2012-01-30 18:34     ` Dmitry A. Kazakov
2012-01-31 12:58     ` Stephen Leake
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox