From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.4
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 103376,3ccb707f4c91a5f2
X-Google-Attributes: gid103376,public
From: dewar@merv.cs.nyu.edu (Robert Dewar)
Subject: Re: Unbounded strings (Was: Java vs Ada 95 (Was Re: Once again,
 Ada absent from DoD SBIR solicitation))
Date: 1996/11/24
Message-ID: <dewar.848839209@merv>
X-Deja-AN: 198417859
references: <325BC3B3.41C6@hso.link.com> <dewar.848202876@merv>
 <E11C57.EF2@world.std.com> 	<dewar.848352010@merv> <E14DCH.5EK@world.std.com>
 <EACHUS.96Nov20190244@spectre.mitre.org>
organization: New York University
newsgroups: comp.lang.ada
Date: 1996-11-24T00:00:00+00:00
List-Id: <comp.lang.ada>


iRobert Eachus says

"      Suppose one task creates an unbounded string, and passes it to
another task through a rendezvous.  Both task create copies, then
modify different sections of the original string.  Definitely
erroneous, but it will make lazy reference counting visible."


The "definitely erroneous" here seems too strong to me. The point is that
we have two basic possible semantics of unbounded strings, one is to copy
on all assignments, and the other is to share copies, and copy only on a
modification.

I am completely unable to determine that one of these implementations is
somehow more reasonable, natural, expected, required or whatever, they
both seem quite reasonable to me.

But if we copy only on modification, then it is indeed true that the above
scenario is erroneous because the "create copies" presumably refers to
doing assignment statements, which do not create copies at the implementation
level. 

Bob tries to argue that this implemntation is somehow improper, but I don't
see that at all. Indeed copy on modify seems quite a reasonable 
implementation to me (it is for example what all SNOBOL4 and SPITBOL
implementations have done). In a garbage collected environment it is
also the most natural implementation.

But it certainly does have consequences with regard to thread safety!

The trouble here is that of course we expect to have proper value
semantics (and that indeed is why Bob's argument about other examples,
such as 1+1 not working are irrelevant, everyone agrees on the basic
single thread value semantics).

[[oops to clarify, Bob in the above  = Bob Duff]]

The trouble is that pure value semantics breaks down in a tasking
environment because of the shared variable issues, and we introduce
the notion of erroneous access to shared variables to patch this
up -- not a very satisfactory solution, but a pragmatic one, since
the only other alternative is to put locks on all variables all the
time.

Now when we get to Unbounded_String's, some kind of similar patch up
is required, but it definitely depends on the implementation model.
Since the RM text does not specify the implementation model, we are
unable to determine exactly the scope of the corresponding pragmatic
solution.

Going back to Robert Eachus' claim of "definitely erroneous", I guess
I *do* agree in the following sense:

Unbounded_String's are built into the language from a semantic point of view.
There are two possible implementations that are reasonable
For one of these implementations, the given operation *is* erroneous
Therefore, since one allowed implementation is erroneous, the set f
 possible effects is erroneous, even if in some implementations it
 might actually work (working is certainly one of the allowed
 behaviors for an erroneous execution -- the same is true after
 all of any shared variable erroneousity -- most of the time things
 are just fine.

One thing is clear, this should have been discussed during the language
design. The implementation technique of using transparent sharing via
pointers, with copy on modify rather than copy on assign is very
familiar, especially in the realm of variable length string processing,
so we shuld have anticipated this model, an discussed its interaction
with tasking.

---

The argument that this is NOT erroneous works something like this:


>From a semantic point of view, assignment involves a copy. Yes, you can
"cheat" and not do a copy, but only if "as if" semantic equivalence is
maintained.

The copy-on-modify approach, though it might be fine (i.e. have fine
"as if" semantics in a single threaded environment) does NOT work fine
in a multi-threaded environment since it does not provide "as if"
semantics.

I find some merit in this argument, but I do not find it decisive, because
at the language level we have already introduced the notion of allowing
assignments and references to become erroneous as a result of shared
variable reference -- and we did it on pragmatic grounds, there would
have been no difficulty *semantically* in requiring assignment to work
properly in a threaded environment, even if shared variables are 
accessed, and no difficulty in implementing it -- just ghastly
efficiency issues.

Well here we sure have a similar case -- we have no trouble agreeing that
Bob Duff's claimed obvious semantics are fine, and that we can implement
them with no trouble -- but the implementation has a ghastly efficiency
problem (if taking locks is expensive).

Note that it is not just reference counts that cause this problem. Even if
we use what I would guess is the expected implementation of unbounded
strings which is to use controlled types (as in GNAT), we are in trouble.

Controlled types themselves implicitly reference global shared data
structures, or at least may do so in the model where chains of objects
to be finalized are maintained. If these chains must be protected against
multi-threading, there are again ghastly efficiency implications in 
systems where taking a lock is expensive. 

Of course if you carry the argument I and Robert Eachus are lmakig to logical
extremes, since type controlled is again just a type in a library package
whose implementation is hidden, it could be argued that it is not necessary
to take locks even in the controlled case, but this would introduce an even
more dramatic restriction on what could be done between tasks (basically
you would require controlled collections to be task specific -- and that
seems awfully restrictive).

Still, we definitely need more discussion here.

This is not the first time we have run into issues of having underspecified
the preefined library routines. Two previous cases are:

  1. The interaction of predefined private types with stream attributes. Do
     these predefined types have stream attributes that "work" in an 
     expected way. The RM does not ensure this (so for example using
     Read and Write with unbounded string is not rquired to work by the
     RM, but surely (?) it should, so an AI has been approved that 
     requires it to work -- this AI is a binding interpretation, a
     recognition that the RM was inadequate here.

  2. The package categorizatoin with respect to distribution pragmas was
     not thought through. The result is that the RM does not just fail to
     guarantee that certain types cannot be used with full flexibility in
     a distributed environment -- it REQUIRES that they NOT be able to be
     used. Again the ARG is looking into this.


Reasoning by what is friendly is never a safe occupation in the world
of language semantics. Certainly I think most poeple would agree that
the stream attributes SHOULD work for unbounded strings. The RM authors
either agreed with this explicitly, or implicitly, or simply did not
think about it. But the RM does NOT pin this down.

I think the business with reference counts and shared unbounded strings
between tasks is very much in this category, and certainly needs an
ARG discussion.

--

There is an interesting lesson for package and interface design here. Some
VERY detailed thought is required to properly document exactly what the
semantics of a package is when it is used in a multi-threaded environment.
I think we have been far too casual about this in the past.

Robert