From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,3ccb707f4c91a5f2 X-Google-Attributes: gid103376,public From: dewar@merv.cs.nyu.edu (Robert Dewar) Subject: Re: Unbounded strings (Was: Java vs Ada 95 (Was Re: Once again, Ada absent from DoD SBIR solicitation)) Date: 1996/11/24 Message-ID: X-Deja-AN: 198417859 references: <325BC3B3.41C6@hso.link.com> organization: New York University newsgroups: comp.lang.ada Date: 1996-11-24T00:00:00+00:00 List-Id: iRobert Eachus says " Suppose one task creates an unbounded string, and passes it to another task through a rendezvous. Both task create copies, then modify different sections of the original string. Definitely erroneous, but it will make lazy reference counting visible." The "definitely erroneous" here seems too strong to me. The point is that we have two basic possible semantics of unbounded strings, one is to copy on all assignments, and the other is to share copies, and copy only on a modification. I am completely unable to determine that one of these implementations is somehow more reasonable, natural, expected, required or whatever, they both seem quite reasonable to me. But if we copy only on modification, then it is indeed true that the above scenario is erroneous because the "create copies" presumably refers to doing assignment statements, which do not create copies at the implementation level. Bob tries to argue that this implemntation is somehow improper, but I don't see that at all. Indeed copy on modify seems quite a reasonable implementation to me (it is for example what all SNOBOL4 and SPITBOL implementations have done). In a garbage collected environment it is also the most natural implementation. But it certainly does have consequences with regard to thread safety! The trouble here is that of course we expect to have proper value semantics (and that indeed is why Bob's argument about other examples, such as 1+1 not working are irrelevant, everyone agrees on the basic single thread value semantics). [[oops to clarify, Bob in the above = Bob Duff]] The trouble is that pure value semantics breaks down in a tasking environment because of the shared variable issues, and we introduce the notion of erroneous access to shared variables to patch this up -- not a very satisfactory solution, but a pragmatic one, since the only other alternative is to put locks on all variables all the time. Now when we get to Unbounded_String's, some kind of similar patch up is required, but it definitely depends on the implementation model. Since the RM text does not specify the implementation model, we are unable to determine exactly the scope of the corresponding pragmatic solution. Going back to Robert Eachus' claim of "definitely erroneous", I guess I *do* agree in the following sense: Unbounded_String's are built into the language from a semantic point of view. There are two possible implementations that are reasonable For one of these implementations, the given operation *is* erroneous Therefore, since one allowed implementation is erroneous, the set f possible effects is erroneous, even if in some implementations it might actually work (working is certainly one of the allowed behaviors for an erroneous execution -- the same is true after all of any shared variable erroneousity -- most of the time things are just fine. One thing is clear, this should have been discussed during the language design. The implementation technique of using transparent sharing via pointers, with copy on modify rather than copy on assign is very familiar, especially in the realm of variable length string processing, so we shuld have anticipated this model, an discussed its interaction with tasking. --- The argument that this is NOT erroneous works something like this: >From a semantic point of view, assignment involves a copy. Yes, you can "cheat" and not do a copy, but only if "as if" semantic equivalence is maintained. The copy-on-modify approach, though it might be fine (i.e. have fine "as if" semantics in a single threaded environment) does NOT work fine in a multi-threaded environment since it does not provide "as if" semantics. I find some merit in this argument, but I do not find it decisive, because at the language level we have already introduced the notion of allowing assignments and references to become erroneous as a result of shared variable reference -- and we did it on pragmatic grounds, there would have been no difficulty *semantically* in requiring assignment to work properly in a threaded environment, even if shared variables are accessed, and no difficulty in implementing it -- just ghastly efficiency issues. Well here we sure have a similar case -- we have no trouble agreeing that Bob Duff's claimed obvious semantics are fine, and that we can implement them with no trouble -- but the implementation has a ghastly efficiency problem (if taking locks is expensive). Note that it is not just reference counts that cause this problem. Even if we use what I would guess is the expected implementation of unbounded strings which is to use controlled types (as in GNAT), we are in trouble. Controlled types themselves implicitly reference global shared data structures, or at least may do so in the model where chains of objects to be finalized are maintained. If these chains must be protected against multi-threading, there are again ghastly efficiency implications in systems where taking a lock is expensive. Of course if you carry the argument I and Robert Eachus are lmakig to logical extremes, since type controlled is again just a type in a library package whose implementation is hidden, it could be argued that it is not necessary to take locks even in the controlled case, but this would introduce an even more dramatic restriction on what could be done between tasks (basically you would require controlled collections to be task specific -- and that seems awfully restrictive). Still, we definitely need more discussion here. This is not the first time we have run into issues of having underspecified the preefined library routines. Two previous cases are: 1. The interaction of predefined private types with stream attributes. Do these predefined types have stream attributes that "work" in an expected way. The RM does not ensure this (so for example using Read and Write with unbounded string is not rquired to work by the RM, but surely (?) it should, so an AI has been approved that requires it to work -- this AI is a binding interpretation, a recognition that the RM was inadequate here. 2. The package categorizatoin with respect to distribution pragmas was not thought through. The result is that the RM does not just fail to guarantee that certain types cannot be used with full flexibility in a distributed environment -- it REQUIRES that they NOT be able to be used. Again the ARG is looking into this. Reasoning by what is friendly is never a safe occupation in the world of language semantics. Certainly I think most poeple would agree that the stream attributes SHOULD work for unbounded strings. The RM authors either agreed with this explicitly, or implicitly, or simply did not think about it. But the RM does NOT pin this down. I think the business with reference counts and shared unbounded strings between tasks is very much in this category, and certainly needs an ARG discussion. -- There is an interesting lesson for package and interface design here. Some VERY detailed thought is required to properly document exactly what the semantics of a package is when it is used in a multi-threaded environment. I think we have been far too casual about this in the past. Robert