From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,2308afbbe4ecec0b
X-Google-Attributes: gid103376,public
From: Robert A Duff <bobduff@world.std.com>
Subject: Re: Subverting 'Access for Sub-programs
Date: 1999/08/06
Message-ID: <wccvhatnima.fsf@world.std.com>
X-Deja-AN: 509580547
Sender: bobduff@world.std.com (Robert A Duff)
References: <wcczp06jj39.fsf@world.std.com>
 <Qylq3.465$06.38224@typhoon-sf.snfc21.pbi.net>
Organization: The World Public Access UNIX, Brookline, MA
Newsgroups: comp.lang.ada
Date: 1999-08-06T00:00:00+00:00
List-Id: <comp.lang.ada>

tmoran@bix.com writes:

> > ... true of 'Unchecked_Access on subprograms,
> >but please note that the design team never wanted to allow that.
>   Have I missed a message where the safe method the design team
> contemplated was described?

Not a *recent* message, but it's been discussed before.
Here's the language study note that folks have been
referring to.  Note that I wrote this some years ago,
and I don't necessarily still agree with myself on every
detail.  ;-)

There were actually *two* safe methods proposed -- one involving
compile-time checks and one involving run-time checks.  I prefer the
compile-time-checked version.  Robert Dewar is quite correct that
there are things that 'Unrestricted_Access can do that the proposals
below can't.

!topic LSN on General Access Types
!key LSN-1083 on General Access Types
!reference RM9X-3.10.2;4.0
!reference RM9X-3.10.2(16);4.0
!reference AARM-12.3(12.p,12.q);4.0
!reference LSN-1042 on Accessibility Checks in Generics
!from Bob Duff $Date: 94/02/15 15:17:44 $ $Revision: 1.5 $
!discussion

Two issues related to access types and the accessibility rules came
up again at the recent DR/XRG meeting in the UK:

    - The rules prevent "downward closures".

    - The rules are applied in an assume-the-worst manner in generic
      bodies.

These issues keep coming up, because they represent clearly-desirable
functionality, but with a substantial implementation cost (at least for
some implementations).  It is difficult to make the trade-off.  We
recommend not supporting downward closures, but solving the generic
problem by going over to run-time accessibility checks in generics.
Below, we discuss the issues, and the reasons for our recommendation.

LSN-1042, published in October, 1992, discusses in general the
relationship between accessibility checks and the generic contract
model.

----------------

Downward Closures:

Ada's access-to-subprogram types represent assignable subprogram
values.  They support the ability to create data structures
containing the identities of subprograms.  They support callbacks,
where one software component declares a subprogram and hands its
identity to another software component, which can then "call back"
the first software component at a later time.

These types do not support downward closures, where the identity of a
more nested subprogram can be passed to a less nested subprogram.
Downward closures can be used, for example, to implement iterators,
like this:

    package P is
        type Integer_Set is private;
        ... -- various operations
        type Integer_Action is access procedure(I: Integer);
        procedure Iterate(S: Integer_Set; Action: Integer_Action);
            -- Call Action on each member of S.
    end P;

    function Count(S: Integer_Set) return Natural is
        Result: Natural := 0;
        procedure Increment(I: Integer) is
        begin
            Result := Result + 1;
        end Increment;
    begin
        Iterate(S, Increment'Access); -- Illegal!
        return Result;
    end Count;

Increment'Access above is illegal, because Increment is more nested
than type Integer_Action.  It would be bad to have to make Increment
global, because that would require making the variable Result global.
It is a bad idea to force the use of global variables in a language
that has tasks -- in this example, Count would not be reentrant if
Result were global.

Note that this situation is typical of iterators and similar things
-- in typical use, the action passed to an iterator will often need
visibility upon a variable declared in the caller of the iterator, in
order to accumulate some sort of result.

The reason for the restriction is that access-to-subprogram types
allow assignment.  At the point where Iterate is called, we don't
know if the body of Iterate is going to save the access value in a
global variable.  We disallow passing the nested Increment procedure
because Iterate *might* save a pointer to it, and call it later,
after Increment (and the variable Result) no longer exist.

Pascal supports downward closures, but it does not support Ada's
callback style, because the identity of a subprogram being passed as
a parameter cannot be copied.  Modula-2 supports the callback style,
but not the downward closure style.  Languages like Lisp support a
much more general form of closure, which is not of interest to us,
since it requires "stack frames" to be heap allocated -- "heap
frames", really, in the general case.  C supports the callback style,
but since there are no nested functions, the question of downward
closures does not arise.  There are no tasks, either, in C, so making
variables global (or C's "static") does not cause as much trouble as
in Ada.  (Supporting threads in C presents some interesting
problems!)

Implementation Issues: In a language (like Ada) with nested
subprograms, each subprogram needs some way to access its environment
-- that is, the variables declared outside it.  There are two common
methods for representing this environment: static links, and
displays.  Most Ada compilers use static links, but some use
displays.  One key difference between the two methods is that static
links have a compile-time-known size (32 bits, on a typical machine),
whereas the size of a display depends on the maximum nesting depth
throughout the partition.

In an implementation with static links, an access-to-subprogram value
is generally represented as a pair of addresses -- the address of the
subprogram's code, and the static link.  However, given the current
Ada 9X rules, an access-to-subprogram type that is declared at
library level does not need the static link -- it can be represented
as just a code address.  This is nice, because it is efficient, and
it allows the representation to easily match C's typical
representation of function pointers.  Access-to-subprogram types
declared one level deeper than library level can also be represented
as just a code address.  However, two or more levels deeper requires
the static link.

In an implementation with displays, given the current rules, an
access-to-subprogram value can be represented as just a code address
-- it is not necessary to attach a display to each access value.
This also has the nice interoperability-with-C property.

It has been suggested that downward closures be supported by allowing
the Unchecked_Access attribute on subprograms.  (It is currently
allowed only on objects.)  The semantics would presumably be that the
access value returned is valid (the designated subprogram can be
called) so long as the designated subprogram (and it's containing
stack frame) still exist.  If it doesn't exist, a dereference would
be erroneous.  We do not recommend this method for these reasons:

    - It would be unsafe -- it is uncomfortable to introduce
      erroneousness to a high-level feature like downward closures.

    - It would require every access-to-subprogram value to carry a
      static link or display, because the nesting level of the
      designated subprogram would no longer be restricted at
      compile time.  This would break the interoperability-with-C
      property, and would be less efficient.  (One could imagine
      an implementation using a different representation when the
      Convention is C, but that adds complexity, and doesn't solve
      the efficiency problem in the Ada-only case.)

There are two ways to support downward closures that we would
recommend, if downward closures are to be supported.  Both ways
recognize the fact that downward closures really need a separate
feature from the callback style of access-to-subprograms:

    - Limited access-to-subprogram types.

    - Access-to-subprogram parameters.

Limited access types were in an earlier version of Ada 9X.  Since
they would be limited, assignment would be disallowed, so
downward-closure-style parameter passing could be allowed, and would
be safe.  The compiler would implement limited access-to-subprogram
types with a static link or display.  Non-limited ones would retain
the interoperability-with-C property.  The syntax would be something
like this:

    type Integer_Action is limited access procedure(I: Integer);
                        -- ^^^^^^^
    procedure Iterate(S: Integer_Set; Action: Integer_Action);
        -- Call Action on each member of S.

Type conversion from limited to non-limited would be illegal.

Access-to-subprogram parameters would be an extension to access
parameters (not to be confused with parameters of a named access
type).  The syntax would be something like this:

    procedure Iterate(S: Integer_Set;
                      Action: access procedure(I: Integer));
        -- Call Action on each member of S.

This is quite similar to the syntax used in Pascal.  As for any access
parameter, the access type would be anonymous.  Parameter type matching
would be based on the designated profile.  Access parameters already use
run-time accessibility checks -- this new kind would do likewise.  Thus,
assignment would be allowed, but the accessibility checks would prevent
dangling pointers.  In our iterator example, if Iterate were to try to
copy Action to a global variable (during the call to Iterate from
Count), then Constraint_Error would be raised, because Count.Increment
is too deeply nested.  The value of an access-to-subprogram parameter
would need to have a static link or display.  Library-level access types
could retain the interoperability-with-C property.

Of the two workable methods outlined above, the access-to-subprogram
parameters method would be somewhat simpler in Reference Manual
terms, because it would be able to piggyback on the existing access
parameters feature for its rules and semantics.  The implementation
complexity seems equivalent.  Efficiency of the limited access types
method would be slightly better, because it would not be necessary to
pass in information needed for the run-time accessibility checks.
Limited access types would be the first elementary limited types;
we would have to reword the parameter passing rules to avoid a
contradictory requirement to pass by copy and by reference.  (It
doesn't really matter which is done, actually.)

For implementations with static links, the implementation of either
method is straightforward.  Such implementations already need to
include a static link in at least some access-to-subprogram values,
and the downward-closure ones would be the same.

However, for implementations with displays, either method is an
implementation burden, because:

    - The current rules do not require carrying the display
      with the access-to-subprogram value, whereas the new
      rules would.

    - The size of the display is not easily known at compile time.
      It's size can be calculated at run time, or a maximum possible
      size can be known if there is a restriction on nesting depth.
      Either way, managing the passing of the unknown size display
      causes complexity.  It is certainly *possible* to implement:
      I know of at least one Pascal compiler that used displays,
      and implemented downward closures.  Gnu C supports nesting
      of functions as an extension, and supports pointers to them.
      And Ada has other features that require the management of
      unknown-sized data structures.  Nonetheless, the implementation
      is not trivial.

It should also be noted that an implementation with displays would be
less efficient than one with static links, because an indirect call
would require copying the entire display.  It's not that big of a
deal, however -- it just means copying several words of memory, and
there would be no *distributed* overhead.

Note also that there are workarounds: Subprograms can be passed as
generic formal parameters, so an iterator can often be implemented as
a generic formal parameter.  However, that prevents recursion, and is
more verbose.  A closure can also be represented as an
access-to-class-wide value.  In the iterator example, whatever state
is needed by the Action procedure would be encapsulated in a type
extension.  However, this method is even more verbose.

One possibility would be to not support downward closures directly as
access-to-subprogram types, but to make the workaround a bit easier.  In
particular, we could allow limited tagged types to be extended in a more
nested place than the parent type.  (The accessibility rule would remain
as is for non-limited type extensions.)  Instead, a rule would be needed
for allocators, similar to the current rule for types with access
discriminants.  This relaxation is possible for limited types, because
they have no assignment, and hence cannot be copied to a place where the
object lives longer than its type.  This would allow the following:

    package P is
        type Integer_Set is private;
        ... -- various operations
        type Closure is tagged limited null record;
        procedure Action(C: Closure; I Integer);
        procedure Iterate(S: Integer_Set; C: Closure);
            -- Call Do_It(Action, M) for each member M of S.
    end P;

    function Count(S: Integer_Set) return Natural is
        type My_Closure_Type is new Closure with null record; -- Illegal!
        Result: Natural := 0;
        procedure Action(C: Closure; I: Integer) is
        begin
            Result := Result + 1;
        end Action;
        My_Closure: My_Closure_Type;
    begin
        Iterate(S, My_Closure);
        return Result;
    end Count;

The above is less verbose than the current workaround, and has the
advantage that My_Closure_Type can be declared where it arguably belongs
-- inside Count.

The implementation could store the static link or display in each object
of a limited type extension that was declared at a more-nested level
than its parent.  Code would be added to each potentially primitive
subprogram of such a type, to move the static link or display from the
parameter object to its proper location.  Note that no change to the
call sites is necessary.  Nonetheless, there is some implementation
burden, because extra compiler work is needed for declaring these
objects, and in the bodies of the potentially primitive subprograms.

(The reason I say "potentially" primitive is that because of renaming,
we don't know in general while compiling a given subprogram that it will
be primitive.  Potentially primitive subprograms are essentially the
ones declared at the same nesting level, and that take a parameter of
the type.)

We (reluctantly) recommend not supporting downward closures, because:

    - An implementation burden exists for display-based implementations.

    - Workarounds exist.

    - It seems like too big a change to make to the language at this
      late date.

I say "reluctantly" recommend, because it is clear to me that the
functionality is desirable, and if we were designing Ada 9X from
scratch, we would support downward closures.

----------------

Generic Bodies:

Currently, accessibility rules are checked in generic bodies in an
assume-the-worst manner (see 3.10.2(16) and 12.3(12.p,12.q)).  This
means that such rules are checked assuming the instantiation will be
more nested than the generic unit.  This is unfortunate, since most
of the time, the instance will be at the same nesting level as the
generic unit; the rules are therefore more pessimistic than is
usually necessary.

In many cases, it is possible to work around the problem by moving
something from the generic body up to the private part.  For example,
suppose P is a subprogram declared in a generic body, and the body
wishes to do P'Access, of an access type declared outside the generic
unit.  This is illegal, but one can move the declaration of the
subprogram to the private part of the generic, and declare a
constant, also in the private part:

    P_Access: constant Global_Access_To_Subprogram_Type := P'Access;

Similarly, type extensions can be declared in the private part of the
generic when they are illegal in the body.

For the Access attribute for objects, a different workaround is
possible -- use Unchecked_Access instead.  Unfortunately, this throws
away all the benefit of the accessibility checks, and can lead to
erroneous execution if one is not careful.

Another case is for type conversion of an access type declared in the
generic unit (spec or body) to a type global to the generic unit.
This is currently illegal.  A possible workaround is to add a
conversion function to the generic body:

    function Convert(Ptr: access ...) return Global_Access_Type is
    begin
        return Global_Access_Type(Ptr);
    end Convert;

Then, if X is of the inner access type, Convert(X) will do a run-time
check, which will fail if and only if the current instance is more
nested than the generic unit.

All of the above workarounds are annoying, because they cause trouble
when converting a package into a generic package -- one has to apply
the above workarounds after having written and tested the non-generic
package.

One possible solution to the above problems is to define some extra
syntax, or a pragma that means "this generic unit will be
instantiated only at the same nesting level".  However, extra syntax
for a small issue seems like too big a change at this point.  And a
pragma with such a heavy semantic effect seems ugly.

Perhaps a better solution would be to say that all accessibility
rules are checked at run time in instance bodies.  (Doing things at
run time is one way of getting around generic contract model
problems in general, and it would work here.)  An implementation that
does macro expansion for generic instantiations could always detect
these "run-time" errors at macro-expansion time.  An implementation
that does code sharing would have to generate code to do the run-time
checks.

One problem with either solution is that there is a substantial
implementation burden for implementations that do universal generic
code sharing.  For such implementions, when a subprogram declared
inside the generic unit is called, it needs to be implicitly passed
an extra parameter that represents the current instance.  This extra
parameter is used by the subprogram to address variables declared in
the generic body.

Since such subprograms have a different calling convention, if one
allows access values pointing at them to escape outside the generic,
then one has a problem.

We recommend using the run-time accessibility checking solution.
However, because of the implementation problem, we recommend using this
solution only for access-to-object types, and not solving the problem
for access-to-subprogram types.  The implementation problem mentioned
above does not exist for access-to-object types.  This solution is not
entirely consistent, but note that it is already the case that
access-to-subprogram types are different: there are no anonymous
access-to-subprogram types, and there is no Unchecked_Access attribute
for these types.

----------------

SUMMARY:

Downward closures can be sensibly supported, and are quite desirable,
but cause implementation difficulties for display-based
implementations.  Therefore, we recommend not solving this problem.

Removing the accessibility restrictions on generic bodies can also be
sensibly supported, and is also desirable, but (for access-to-subprogram
types) presents severe implementation difficulties for implementations
that do universal sharing of generic bodies.  We recommend solving the
problem for access-to-object types by using run-time checks.  We
recommend not solving the problem for access-to-subprogram types.

-- 
Change robert to bob to get my real email address.  Sorry.