From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.5-pre1 (2020-06-20) on
	ip-172-31-74-118.ec2.internal
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=3.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.5-pre1
Date: 19 Nov 92 18:45:04 GMT
From: cis.ohio-state.edu!zaphod.mps.ohio-state.edu!darwin.sura.net!sgiblab!cs.u
oregon.edu!news.uoregon.edu!news.u.washington.edu!usenet.coe.montana.edu!hydrog
en.oscs.montana.edu!dpoole@ucbvax.  (David Poole)
Subject: GNU-NYU Ada project
Message-ID: <1992Nov19.184504.22639@coe.montana.edu>
List-Id: <comp.lang.ada>

  One of the professors at my university posted this to our local newsgroup. 
I haven't seen anything here on this, so I thought I'd post it and see if
anyone can confirm/deny this report.  

  A few things in this report seem a mite confusing.  It mentions the parser
running around 1,000,000 lines a minute on a 16 MHz 386.  Is this realistic? 
Also, the parser uses recursive descent.  Why?  I thought LALR was the best
possible.

-----------------------------------cut here--------------------------------

          GNAT Project
          GNU-NYU Ada Translator.

         GNAT Report 0001-921101

            November 1, 1992

1. Introduction.
----------------

This is the first summary of activities of the project since its official
start in July 1992. The goal of the project is the construction of a portable
Ada9X compiler, based on the following three components: 

a) The very successful GCC back-end. 

b) The Ada/Ed front-end and tasking library.

c) The Ada9X extensions developed by the Implementation Analysis Team
(i.e. the NYUADA project) itself) as part of the Ada9X project.  

We will demonstrate some of the running portions of the system at the 
Triada92 conference this month in Orlando. To sum up: the parser is complete,
the semantic analyzer handles basic declarations, subprograms and packages,
the tree transformer can process basic declarations and statement forms, all
of which makes it possible to execute very simple programs. We are on
schedule to demonstrate most of the Ada9X functionality at the Ada/Europe
conference in June 1993, and to demonstrate a fully bootstrapped system in
December 1993, which is the end date of the project. 

We have focused much of our work on the design and implementation of 
convenient abstract interfaces between phases of the compiler. This is
one of the obvious benefits of the use of Ada, but we were struck by the way
in which the language forced us in this direction, and led us to complete
and polished interfaces first. A small testimonial to the extent to which
the language can influence good design.

II. System overview.
--------------------

The GNAT system includes the following modules:

a) A scanner and parser, written in Ada 83. The parser produces an abstract
syntax tree with sufficient information to reproduce exactly the source
program (including token placement).

b) A semantic analyzer, which annotates the AST with type information,
performs name and overload resolution, and a large number of legality
checks. The output of the semantic analyzer is the annotated AST. This
phase is written in Ada 83 as well.

c) A tree transformer that traverses the AST and constructs a semantically
equivalent tree, which is input to the GCC back end. This phase is written
in C. Communication between semantics and transformation is by means of a
file (at least conceptually).

d) An enhanced GCC back end, which emits all required constraint checks
and recognizes all instructions that may raise hardware exceptions. 

e) A library manager. To maximize compatibility with the current library
model of GCC, no separate library file is created, but all necessary library
information is stored directly in .o files, and recreated when needed by
the library manager. We retain the basic principle that the compilation of
one source file produces one object file. This is not a conventional model
for Ada systems, but it will simplify the construction of mixed-language
systems, and will look very familiar to the Unix community.  The library
manager is written in Ada.

f) An Ada-specific binder that assembles subunits, performs late inlining
and generic instantiations, and removes unnecessary access-before-elaboration
checks. This is written in Ada.

g) A run-time system, whose main components are: tasking, I/O, and exception
handling. Most of this is written in Ada.

The following sections detail the design and current status of each of these.

III. The Scanner/Parser.
------------------------

The scanner/parser is complete. We have no performance figures yet, but
it appears to be (when compiling itself) more than an order of magnitude
faster than the previous Ada/Ed parser. The current parser derives from 
an Ada 83 syntax checker written by R.B.K.D in assembly language, whose
awesome performance was of the order of 1,000,000 lines/min on a 16 Mz 386.
Even without any more precise figures, we are confident that this will not 
be the bottleneck of the system!

The parser uses recursive descent and has an elaborate error recovery
mechanism, which includes heuristics to make use of program indentation
(on the reasonable assumption that most programmers use indentation to
reflect significant aspects of the nesting structure of their program). 
The parser produces almost no cascaded errors on all the ACVC B-tests,
and seems to recover as well as the previous LALR parser of Ada/Ed.
The parser includes an Ada83 switch. Note however that we have no plans
of producing a complete Ada 83 system: our goal is Ada 9X, and we are
NOT starting with an Ada 83 subset.  

IV. The Abstract Syntax Tree.
-----------------------------

The Scanner/Parser produces an AST and a names table. The low-level details
of the data-structure are hidden by means of a procedural interface, which
includes a retrieval procedure for each non-terminal in the language. All
these retrieval procedures are intended to be inlined, but currently they
include copious tracing and debugging information. In the interest of
efficiency, we have chosen to use very simple data structures to describe the
AST (for example, we make no use of discriminated records) and the code
includes a significant amount of internal consistency checks (essentially,
a small but strategically placed collection of discriminant checks). Some
future version of GNAT, written in Ada 9X, might display an agressive
use of tagged types and type extensions.

There is no separate symbol table. Each defining identifier (i.e. the
defining occurrence of a program entity) is annotated with semantic
information, such as its type. Each use occurrence is made to point to the
corresponding defining occurrence, after name resolution. The AST thus
resembles a DIANA representation. Although we have not examined the question
in full detail, it seems that our AST structure will support an ASIS package
with very little effort.

Two packages provide interfaces to the AST: Syntax_Info describes the
tree as it is produced by the parser, and Entity_Info describes the
semantic annotations. Both of these interfaces are completely procedural:
no indication of the actual layout of AST nodes is visible to any client of 
front-end. To support the code of the tree transformer, which is written in
C, there are C translations of these packages. The header files for these
are produced automatically (by means of a SPITBOL program) to insure that
they remain consistent with the Ada versions.

V. The Semantic Analyzer.
-------------------------

The semantic analyzer performs several traversals of the AST, to produce
semantic annotations, apply legality checks, and expand constructs whose
direct translation into the GCC tree form would be awkward (e.g. aggregates). 
The first (and largest) pass of the analyzer performs name, type and
overload resolution, and collects most semantic attributes. This phase is
being written following the general organization of the Ada/Ed front-end
(the SETL version, which includes already such Ada9X features as tagged
types and extensions, and generic formal packages).  We have chosen the
following package structure for this phase: there is one package for each
chapter of the Ada Reference manual, and one driver package. All chapter
packages are subunits of the driver. 

package Sem1 is            -- Driver 
    procedure Analyze (T: Node_ID);    -- top-level procedure.
    package Sem2 is ...          -- pragma handling.
    package Sem3 is ...          -- declarations and types.
    ...
end Sem1;
package body Sem1 is
    ...
    package body Sem2 is separate.
    package body Sem4 is separate.
    ...
end Sem1 ;
    
Use clauses in each proper body simplify the invocation of procedures
that are used everywhere (for example, all packages use Sem4 and Sem8, but
no one needs easy access to the contents of Sem11). 

As of 11/1, enough of Sem1, Sem3, Sem5, Sem6, Sem7, and Sem8 is written
to handle scalar declarations and types, control structures, subprograms
and calls, packages, and simple name resolution. The design of overload
resolution data structures and algorithms is complete but not implemented
yet. 

VI. The Tree transformer.
-------------------------

The tree transformer performs the critical task of mapping the semantics
the AST into the corresponding GCC tree, mimicking the actions of other
GCC front-ends. This is done by traversing the AST and invoking tree
constructors provided in GCC to  build tree fragments that are eventually
assembled into the tree of one complete compilation unit. The traversal of
the AST uses the procedural interfaces described above. The description of
the GCC tree constructors can be found in the GCC documentation, and we will
not say anything about it here, except to indicate that Ada-specific nodes
have been added, to describe constraint checks on expressions and values.
We expect to make a few additional extensions to the GCC tree, but we intend
to keep them to a minimum. For example, the handling of in-out parameters
is done currently by special casing single parameters and by constructing
a record that is passed by reference in the case of multiple in-out parameters.
This can be done without any modifications to the GCC tree structure.

On the other hand, semantic details that may be of use to other GCC
languages will be implemented directly in the code generator. This is the
case of exceptions for example, given that they will be used by C++ as well
as Ada.

We explored (for the space of a week) the option of writing the transformer
in Ada, with extensive use of the pragma INTERFACE to invoke the GCC
tree constructors. In spite of the appeal of writing more of the system in
Ada, we decided that this would lead to a system that would be exceedingly
difficult to design cleanly, that would be hard to debug, and that would
lead to messy heterogeneous structures.
As of 11/1, the tree transformer can generate the GCC trees corresponding 
to subprograms, simple control structures, and arithmetic expressions. 

VIII. The code generator.
-------------------------

The GCC back end is of course the largest portion of the system. It includes
a multi-pass retargetable code generator, and has been ported to more than
25 machines so far. It generates excellent code, comparable (and at times
superior) to that of the best commercial compilers. The architectural
details of each target are encapsulated in a machine description file that
specifies precisely the semantics of all machine operations, in terms of a
common register transfer language. In order to support Ada, the machine
description formalism must be extended to specify more precisely the 
relationship between instructions and exceptions. The code generator must
respect Ada semantics and inhibit code motion where prohibited. Richard Kenner,
the author of several GCC ports, and Richard Stallman, the creator of GCC,
are designing these extensions. 

IX. The library manager.
------------------------

Design of the library manager is in its very early stages. We have decided
that the information to be stored in object files is only that found in the
AST, i.e. no GCC-generated information is ever preserved. This means that
the GCC trees that correspond to package specifications appearing in a context
clause are reconstructed rather than just retrieved. This corresponds to
the standard handling of header files in C, and does not seem to be onerous
(but once we have measurements we may change our minds!). Similarly, stack
layout is recomputed as needed. This adds to the cost of compiling proper
bodies of subunits, but given that the compilation of proper bodies requires
reconstructing the visibility environment of the stub, all the declarations
that are visible at the location of the stub must be reprocessed anyway. The
advantage is that the library manager has deal only with one data structure,
namely the AST.

VII.  The rest.
---------------

The design of the  binder and run-time system will be described in future
reports. As portions of the system become more stable we will make public
those interfaces that seem mature enough for external consumption.


 -----------------------------------end cut --------------------------------