comp.lang.ada
 help / color / mirror / Atom feed
From: Stephen Leake <stephen_leake@stephe-leake.org>
Subject: Re: OpenToken: Handling the empty word token
Date: Sun, 29 Jan 2012 12:45:54 -0500
Date: 2012-01-29T12:45:54-05:00	[thread overview]
Message-ID: <82ehuibdwt.fsf@stephe-leake.org> (raw)
In-Reply-To: jfvqqu$83e$1@munin.nbi.dk

"Randy Brukardt" <randy@rrsoftware.com> writes:

> "Dmitry A. Kazakov" <mailbox@dmitry-kazakov.de> wrote in message 
> news:1jvlv7i0tn14u.b5d2cwsqhl2h$.dlg@40tude.net...
>> On Fri, 27 Jan 2012 08:22:12 -0800 (PST), mtrenkmann wrote:
>>
>>> Is there a way to instrument the parser to silently accept the epsilon
>>> token whenever it expects it without consuming a token from the lexer,
>>> or is it a common convention to translate each grammar into a epsilon-
>>> free representation?
>>
>> I use neither explicit grammars nor OpenToken, so it is possible that I
>> didn't really understand the problem you have.
>
> Like Dmitry, I don't use OpenToken, but I do use a LALR(1) parser generator 
> (ours originates in a University of Wisconsin research project from the late 
> 1970s).
>
> In all of the grammars I've seen, you don't write anything for an epsilon 
> production; that's because you are matching nothing. But there is no problem 
> in matching nothing, so long as your grammar generator is powerful enough 
> (uses at least LALR(1) parsing, or perhaps LR(1) parsing). In that case, 
> matching nothing works so long as the follow sets are disjoint (something 
> that fails to be true periodically in our Ada grammar).
>
> For instance, here's the grammar for parameter modes from the Janus/Ada 
> compiler grammar:
>
> mode ::= IN ## 93
>     | OUT ## 94
>     | IN OUT ## 95
>     |   ## 198
>
> Note that the last production is an epsilon production. The ## part gives an 
> action number associated with the matching of that particular alternative of 
> this production. The ## part also marks the end of the production (it's 
> optional, and | also ends a production -- but it's required on the last 
> alternative as the grammar of our grammar uses insignificant line endings 
> like Ada does).
>
> I'd be surprised if OpenToken didn't have something similar; 

Not quite. Because OpenToken uses Ada types to build the grammar, we
need an explicit Epsilon token (full code below):

   Grammar : constant Production_List.Instance :=
     Tokens.Parse_Sequence <= Tokens.Paren_Left & Tokens.Mode & Tokens.Paren_Right + Arg_Action'Access and
     Tokens.Mode <= Tokens.In_Tok + Mode_Action'Access and
     Tokens.Mode <= Tokens.Out_Tok + Mode_Action'Access and
     Tokens.Mode <= Tokens.In_Tok & Tokens.Out_Tok + Mode_Action'Access and
     Tokens.Mode <= Tokens.Epsilon + Mode_Action'Access;

> and if it doesn't, you probably need to upgrade to a better grammar
> generator.

One way to do that is to improve OpenToken :).

In this case, we might be able to provide a monadic "+" that would do
the right thing, but I didn't try that.

pragma License (GPL);

with Ada.Text_IO;
with OpenToken.Production.List;
with OpenToken.Production.Parser.LALR;
with OpenToken.Production.Parser;
with OpenToken.Recognizer.Character_Set;
with OpenToken.Recognizer.End_Of_File;
with OpenToken.Recognizer.Keyword;
with OpenToken.Recognizer.Nothing;
with OpenToken.Text_Feeder.String;
with OpenToken.Token.Enumerated.Analyzer;
with OpenToken.Token.Enumerated.List;
with OpenToken.Token.Enumerated.Nonterminal;
procedure Debug is

   type Token_ID_Type is
     (EOF_ID,
      Epsilon_ID,
      In_ID,
      Out_ID,
      Paren_Left_ID,
      Paren_Right_ID,
      Whitespace_ID,

      --  non-terminals
      Mode_ID,
      Parse_Sequence_ID);

   package Master_Token is new OpenToken.Token.Enumerated (Token_ID_Type);
   package Token_List is new Master_Token.List;
   package Nonterminal is new Master_Token.Nonterminal (Token_List);

   package Production is new OpenToken.Production (Master_Token, Token_List, Nonterminal);
   package Production_List is new Production.List;

   use type Production.Instance;        --  "<="
   use type Production_List.Instance;   --  "and"
   use type Production.Right_Hand_Side; --  "+"
   use type Token_List.Instance;        --  "&"

   package Tokens is
      EOF         : constant Master_Token.Class := Master_Token.Get (EOF_ID);
      Epsilon     : constant Master_Token.Class := Master_Token.Get (Epsilon_ID);
      In_Tok      : constant Master_Token.Class := Master_Token.Get (In_ID);
      Out_Tok     : constant Master_Token.Class := Master_Token.Get (Out_ID);
      Paren_Left  : constant Master_Token.Class := Master_Token.Get (Paren_Left_ID);
      Paren_Right : constant Master_Token.Class := Master_Token.Get (Paren_Right_ID);

      --  Nonterminals
      Mode           : constant Nonterminal.Class := Nonterminal.Get (Mode_ID);
      Parse_Sequence : constant Nonterminal.Class := Nonterminal.Get (Parse_Sequence_ID);
   end Tokens;

   package Tokenizer is new Master_Token.Analyzer (Last_Terminal => Whitespace_ID);

   Syntax : constant Tokenizer.Syntax :=
     (EOF_ID         => Tokenizer.Get (OpenToken.Recognizer.End_Of_File.Get, Tokens.EOF),
      Epsilon_ID     => Tokenizer.Get (OpenToken.Recognizer.Nothing.Get),
      In_ID          => Tokenizer.Get (OpenToken.Recognizer.Keyword.Get ("in")),
      Out_ID         => Tokenizer.Get (OpenToken.Recognizer.Keyword.Get ("out")),
      Paren_Left_ID  => Tokenizer.Get (OpenToken.Recognizer.Keyword.Get ("(")),
      Paren_Right_ID => Tokenizer.Get (OpenToken.Recognizer.Keyword.Get (")")),

      Whitespace_ID => Tokenizer.Get
        (OpenToken.Recognizer.Character_Set.Get (OpenToken.Recognizer.Character_Set.Standard_Whitespace))
     );

   procedure Arg_Action
     (New_Token : out Nonterminal.Class;
      Source    : in  Token_List.Instance'Class;
      To_ID     : in  Token_ID_Type)
   is begin
      Nonterminal.Synthesize_Self (New_Token, Source, To_ID);
      Ada.Text_IO.Put_Line ("arg action");
   end Arg_Action;

   procedure Mode_Action
     (New_Token : out Nonterminal.Class;
      Source    : in  Token_List.Instance'Class;
      To_ID     : in  Token_ID_Type)
   is begin
      Nonterminal.Synthesize_Self (New_Token, Source, To_ID);
      Ada.Text_IO.Put_Line ("mode action");
   end Mode_Action;

   Grammar : constant Production_List.Instance :=
     Tokens.Parse_Sequence <= Tokens.Paren_Left & Tokens.Mode & Tokens.Paren_Right + Arg_Action'Access and
     Tokens.Mode <= Tokens.In_Tok + Mode_Action'Access and
     Tokens.Mode <= Tokens.Out_Tok + Mode_Action'Access and
     Tokens.Mode <= Tokens.In_Tok & Tokens.Out_Tok + Mode_Action'Access and
     Tokens.Mode <= Tokens.Epsilon + Mode_Action'Access;

   package OpenToken_Parser is new Production.Parser (Production_List, Tokenizer);
   package LALR_Parser is new OpenToken_Parser.LALR;
   String_Feeder : aliased OpenToken.Text_Feeder.String.Instance;
   Analyzer : constant Tokenizer.Instance := Tokenizer.Initialize (Syntax);
   Command_Parser : LALR_Parser.Instance := LALR_Parser.Generate (Grammar, Analyzer, OpenToken.Trace_Parse);

   use LALR_Parser;
begin
   OpenToken.Text_Feeder.String.Set (String_Feeder, "( in out )");

   Set_Text_Feeder (Command_Parser, String_Feeder'Unchecked_Access);

   --  Read and parse statements from the string until end of string
   loop
      exit when End_Of_Text (Command_Parser);
         Parse (Command_Parser);
   end loop;

end Debug;

-- 
-- Stephe




  reply	other threads:[~2012-01-29 17:46 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-27 16:22 OpenToken: Handling the empty word token mtrenkmann
2012-01-27 16:48 ` Dmitry A. Kazakov
2012-01-28  3:42   ` Randy Brukardt
2012-01-29 17:45     ` Stephen Leake [this message]
2012-01-31  0:56       ` Randy Brukardt
2012-01-31  9:09         ` Georg Bauhaus
2012-01-31 12:16         ` Stephen Leake
2012-02-02  1:39           ` Randy Brukardt
2012-01-28 10:46 ` Stephen Leake
2012-01-30 16:28   ` mtrenkmann
2012-01-30 18:34     ` Dmitry A. Kazakov
2012-01-31 12:58     ` Stephen Leake
replies disabled

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox