From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM
	autolearn=ham autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,64eba6a6b76afc79
X-Google-Attributes: gid103376,public
X-Google-ArrivalTime: 2003-03-16 14:07:10 PST
Path: 
 archiver1.google.com!news1.google.com!newsfeed.stanford.edu!news.uchicago.edu!newsfeed.cs.wisc.edu!newsfeed.mathworks.com!kibo.news.demon.net!demon!newsfeed00.sul.t-online.de!newsmm00.sul.t-online.com!t-online.de!news.t-online.com!not-for-mail
From: "Oliver Kellogg" <oliver.kellogg@t-online.de>
Newsgroups: comp.lang.ada
Subject: Re: derived_type_definition ::= [abstract] new subtype_ind
 [record_extension_part]
Date: Mon, 17 Mar 2003 00:06:53 +0100
Organization: T-Online
Message-ID: <b52sfh$9g2$05$1@news.t-online.com>
References: <b4r7ct$9lt$07$1@news.t-online.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
X-Trace: news.t-online.com 1047852338 05 9730 PoMqETYTS-AIi0 030316 22:05:38
X-Complaints-To: abuse@t-online.com
X-ID: XuxaWTZ1ZeesysL+Bff3CR4LuKb1Z0QB9yDPxpJl2wd5Ur-dLBpr8t
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 5.00.2615.200
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200
Xref: archiver1.google.com comp.lang.ada:35394
Date: 2003-03-17T00:06:53+01:00
List-Id: <comp.lang.ada>

Pardon the bad style of replying to one's own posting...
I wrote:

> (When making an abstract syntax tree for Ada, I wonder
> whether this is fine-grained enough. In particular, it
> doesn't feel right to have just one node representing
> both alternatives.)

Well, after some more thinking I know where the "bad feeling"
comes from, see the end of the below text. (These are my
internal guidelines for the ANTLR Ada tree design.)

$Id: TREMULA,v 0.0 2003/03/16 22:36:38 kellogg Exp kellogg $

TREe design Meta rULes for Ada

In principle, the tree structure closely follows the RM grammar,
except where good reasons suggest not to.

- Most of the RM rules are mirrored here, both as rules and as tree
  nodes. For an explanation of which rules are not mirrored, see
  the next paragraph.
  Rule names are given in lowercase letters.
  Node types are given in uppercase letters.
  Node types named after RM grammar rules represent the corresponding
  items of the RM.
  Node types not named after RM grammar rules are auxiliary nodes.
  The main purpose of such nodes is to normalize the node structure.
  (For an index of all node types implemented, see the second and
  third part of the "tokens" section of the AdaLexer in ada.g.)

- Only a few RM grammar rules are not mirrored, namely those which
  are trivially reconstructed from finer grained rules/nodes.
  An example of such are the simple_statement and compound_statement
  rules/nodes which are trivially reconstructed from the individual
  statement nodes.
  The reason for omitting this kind of nodes is to avoid burdening
  the tree with "pass-thru" nodes that offer very little added value.

- Additionally, optional items in the RM grammar (i.e. the parts
  enclosed in square brackets) have their own rules, and also their
  own nodes in the tree. The rule name ends in _opt and the node type
  ends in _OPT.
  Subsequent optional items, be they nested or in series, are mapped
  to a single _OPT node, for example:
   [ abstract [ tagged ] ] [ limited ] => ABSTRACT_TAGGED_LIMITED_OPT

- Tree Design Formula:
  To go from an RM rule to the tree, prune the RM rule of terminals.
  The items that remain after this pruning have a direct correspondence
  in the tree node, with the abovementioned augmentation of optional
  items applying.  The sequence of items is preserved in the node.
  The only syntactic items not mirrored in the tree are:
  1. The "END [identifier]" at the end of Ada units et al.
     The optional identifier, if given, is checked to match with the
     corresponding defining_identifier at the start of the unit during
     parsing, but the END keyword and the ending identifier do not
     enter the tree.
  2. The ending semicolon of all statements/declarations.

Here is an example for the mapping from RM grammar to the ANTLR rule,
and to the Abstract Syntax Tree.

  RM grammar rule:
     block_statement ::= [block_statement_identifier:]
        [ DECLARE
             declarative_part ]
        BEGIN
           handled_sequence_of_statements
        END [block_identifier];

  Corresponding ANTLR rule:
     block_statement
  Direct subordinate rules invoked by block_statement:
     statement_identifier_opt
        Handles the "[block_statement_identifier:]".
        This is a node of type STATEMENT_IDENTIFIER_OPT which is
        always present, but whose contents may be empty because the
        block_statement_identifier is optional. The ending colon
        does not appear in the node.
     declare_opt
        Handles the "[ DECLARE declarative_part ]".
        This is a node of type DECLARE_OPT which is always present,
        but whose contents may be empty because the "DECLARE
        declarative_part" is optional.
        The DECLARE keyword does not appear in the node.
     block_body
        Handles the "BEGIN handled_sequence_of_statements".
        Returns a node of type HANDLED_SEQUENCE_OF_STATEMENTS.
        The BEGIN keyword does not appear in the node.
     end_id_opt
        Handles the "END [block_identifier]".
        Checks that the block_identifier matches the
        block_statement_identifier given at the beginning.
        However, no node is constructed.
     semi
        Handles the ending semicolon.
        No node is constructed.
     Note: The existence of the last three rules (block_body, end_id_opt,
     and semi) does not directly follow from the Tree Design Formula
     given above.  In fact, those three rules could be inlined into
     the block_statement rule without ripple.  It just so happens that
     these rules can be reused in a number of other rules.
     Thus, the Tree Design Formula is extended as follows:
  - If a subset of a rule is found to occur in more than one place
    within the grammar, then this subset is factored into a separate
    rule.  The new rule is "synthetic" in that it does not appear in
    the RM grammar.  There may also exist a node type that corresponds
    to the synthetic rule.  However, the design decision about whether
    or not such a node exists is not yet formalized. @@FIXME

  AST node:  type = BLOCK_STATEMENT
     Subordinate nodes:
        STATEMENT_IDENTIFIER_OPT
        DECLARE_OPT
        HANDLED_SEQUENCE_OF_STATEMENTS

==
Applying the Tree Design Formula to the Ada grammar has worked out
fine - with two prominent exceptions:

1. Mentions of PROCEDURE and FUNCTION
   The Annex P grammar always lumps together mentions of PROCEDURE
   and FUNCTION in a single rule.  (The rules are:
   subprogram_specification, access_to_subprogram_definition,
   generic_renaming_declaration, generic_instantiation.)
   However, this is not fine grained enough, i.e. applying the Tree
   Design Formula it is not possible to discard the terminals,
   PROCEDURE and FUNCTION, to arrive at the node structure.
   This problem is solved by maintaining the PROCEDURE and FUNCTION
   terminals within the corresponding nodes.

2. derived_type_definition
   The derived_type_definition rule does not exclude the following:

      TYPE child IS ABSTRACT NEW parent;

   which of course is not valid Ada. In order to maintain a clean
   node design, the derived_type_definition is split into two
   rules/nodes, namely ordinary_derived_type and
   derived_record_extension (rules) resp. ORDINARY_DERIVED_TYPE
   and DERIVED_RECORD_EXTENSION (nodes).