From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,64eba6a6b76afc79 X-Google-Attributes: gid103376,public X-Google-ArrivalTime: 2003-03-16 14:07:10 PST Path: archiver1.google.com!news1.google.com!newsfeed.stanford.edu!news.uchicago.edu!newsfeed.cs.wisc.edu!newsfeed.mathworks.com!kibo.news.demon.net!demon!newsfeed00.sul.t-online.de!newsmm00.sul.t-online.com!t-online.de!news.t-online.com!not-for-mail From: "Oliver Kellogg" Newsgroups: comp.lang.ada Subject: Re: derived_type_definition ::= [abstract] new subtype_ind [record_extension_part] Date: Mon, 17 Mar 2003 00:06:53 +0100 Organization: T-Online Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Trace: news.t-online.com 1047852338 05 9730 PoMqETYTS-AIi0 030316 22:05:38 X-Complaints-To: abuse@t-online.com X-ID: XuxaWTZ1ZeesysL+Bff3CR4LuKb1Z0QB9yDPxpJl2wd5Ur-dLBpr8t X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 5.00.2615.200 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200 Xref: archiver1.google.com comp.lang.ada:35394 Date: 2003-03-17T00:06:53+01:00 List-Id: Pardon the bad style of replying to one's own posting... I wrote: > (When making an abstract syntax tree for Ada, I wonder > whether this is fine-grained enough. In particular, it > doesn't feel right to have just one node representing > both alternatives.) Well, after some more thinking I know where the "bad feeling" comes from, see the end of the below text. (These are my internal guidelines for the ANTLR Ada tree design.) $Id: TREMULA,v 0.0 2003/03/16 22:36:38 kellogg Exp kellogg $ TREe design Meta rULes for Ada In principle, the tree structure closely follows the RM grammar, except where good reasons suggest not to. - Most of the RM rules are mirrored here, both as rules and as tree nodes. For an explanation of which rules are not mirrored, see the next paragraph. Rule names are given in lowercase letters. Node types are given in uppercase letters. Node types named after RM grammar rules represent the corresponding items of the RM. Node types not named after RM grammar rules are auxiliary nodes. The main purpose of such nodes is to normalize the node structure. (For an index of all node types implemented, see the second and third part of the "tokens" section of the AdaLexer in ada.g.) - Only a few RM grammar rules are not mirrored, namely those which are trivially reconstructed from finer grained rules/nodes. An example of such are the simple_statement and compound_statement rules/nodes which are trivially reconstructed from the individual statement nodes. The reason for omitting this kind of nodes is to avoid burdening the tree with "pass-thru" nodes that offer very little added value. - Additionally, optional items in the RM grammar (i.e. the parts enclosed in square brackets) have their own rules, and also their own nodes in the tree. The rule name ends in _opt and the node type ends in _OPT. Subsequent optional items, be they nested or in series, are mapped to a single _OPT node, for example: [ abstract [ tagged ] ] [ limited ] => ABSTRACT_TAGGED_LIMITED_OPT - Tree Design Formula: To go from an RM rule to the tree, prune the RM rule of terminals. The items that remain after this pruning have a direct correspondence in the tree node, with the abovementioned augmentation of optional items applying. The sequence of items is preserved in the node. The only syntactic items not mirrored in the tree are: 1. The "END [identifier]" at the end of Ada units et al. The optional identifier, if given, is checked to match with the corresponding defining_identifier at the start of the unit during parsing, but the END keyword and the ending identifier do not enter the tree. 2. The ending semicolon of all statements/declarations. Here is an example for the mapping from RM grammar to the ANTLR rule, and to the Abstract Syntax Tree. RM grammar rule: block_statement ::= [block_statement_identifier:] [ DECLARE declarative_part ] BEGIN handled_sequence_of_statements END [block_identifier]; Corresponding ANTLR rule: block_statement Direct subordinate rules invoked by block_statement: statement_identifier_opt Handles the "[block_statement_identifier:]". This is a node of type STATEMENT_IDENTIFIER_OPT which is always present, but whose contents may be empty because the block_statement_identifier is optional. The ending colon does not appear in the node. declare_opt Handles the "[ DECLARE declarative_part ]". This is a node of type DECLARE_OPT which is always present, but whose contents may be empty because the "DECLARE declarative_part" is optional. The DECLARE keyword does not appear in the node. block_body Handles the "BEGIN handled_sequence_of_statements". Returns a node of type HANDLED_SEQUENCE_OF_STATEMENTS. The BEGIN keyword does not appear in the node. end_id_opt Handles the "END [block_identifier]". Checks that the block_identifier matches the block_statement_identifier given at the beginning. However, no node is constructed. semi Handles the ending semicolon. No node is constructed. Note: The existence of the last three rules (block_body, end_id_opt, and semi) does not directly follow from the Tree Design Formula given above. In fact, those three rules could be inlined into the block_statement rule without ripple. It just so happens that these rules can be reused in a number of other rules. Thus, the Tree Design Formula is extended as follows: - If a subset of a rule is found to occur in more than one place within the grammar, then this subset is factored into a separate rule. The new rule is "synthetic" in that it does not appear in the RM grammar. There may also exist a node type that corresponds to the synthetic rule. However, the design decision about whether or not such a node exists is not yet formalized. @@FIXME AST node: type = BLOCK_STATEMENT Subordinate nodes: STATEMENT_IDENTIFIER_OPT DECLARE_OPT HANDLED_SEQUENCE_OF_STATEMENTS == Applying the Tree Design Formula to the Ada grammar has worked out fine - with two prominent exceptions: 1. Mentions of PROCEDURE and FUNCTION The Annex P grammar always lumps together mentions of PROCEDURE and FUNCTION in a single rule. (The rules are: subprogram_specification, access_to_subprogram_definition, generic_renaming_declaration, generic_instantiation.) However, this is not fine grained enough, i.e. applying the Tree Design Formula it is not possible to discard the terminals, PROCEDURE and FUNCTION, to arrive at the node structure. This problem is solved by maintaining the PROCEDURE and FUNCTION terminals within the corresponding nodes. 2. derived_type_definition The derived_type_definition rule does not exclude the following: TYPE child IS ABSTRACT NEW parent; which of course is not valid Ada. In order to maintain a clean node design, the derived_type_definition is split into two rules/nodes, namely ordinary_derived_type and derived_record_extension (rules) resp. ORDINARY_DERIVED_TYPE and DERIVED_RECORD_EXTENSION (nodes).