From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=0.7 required=5.0 tests=BAYES_00,INVALID_DATE, REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4 Path: utzoo!utgpu!watmath!clyde!ima!think!barmar From: barmar@think.COM (Barry Margolin) Newsgroups: comp.lang.ada Subject: Re: Garbage Collection Message-ID: <35328@think.UUCP> Date: 11 Jan 89 09:22:46 GMT References: <35300@think.UUCP> <4054@hubcap.UUCP> Sender: news@think.UUCP Reply-To: barmar@kulla.think.com.UUCP (Barry Margolin) Organization: Thinking Machines Corporation, Cambridge MA, USA List-Id: In article <4054@hubcap.UUCP> billwolf@hubcap.clemson.edu writes: > Let's assume the existence of a suitably implemented warning container, > which has been appropriately hardened for use as a shared variable. > Our compiler proceeds by read-locking the file and write-locking the > warning structure; having acquired the locks, the warning structure > for that particular file is updated, and the locks are then released. > Our editor proceeds by write-locking the file and read-locking the > warning structure, updating the file, and then releasing all locks. > Our warning destroyer proceeds by write-locking and then destroying > the specified warning structure(s). > > That was almost too easy. Did I misunderstand the problem? First of all, whether the file is locked is immaterial to the discussion (I never actually said that the compiler compiles files -- in Lisp the compiler can also be invoked on in-core interpreted functions, and many Lisp programming environments allow in-core editor buffers to be compiled). However, I did imagine locking of the COMPILER_WARNINGS database. I expected that it would be read-locked during the operation of the SELECT operation, and write-locked during ADD and DELETE. But once SELECT returns, the lock would be released. The lock serves to protect the collection structure (either an array or linked list, probably), but not the individual elements (since there are no operations that modify an existing warning object). So, I envision that the editor proceeds by read-locking the warnings structure, selecting the appropriate warnings, and then releasing the lock. It then displays the selected warnings to the user and allows him to perform operations such as "go to line referenced in next warning". In many traditional systems the warnings database is a text file containing the output of the compiler, and the SELECT operation works by reading this file and parsing it into a set of warnings structures internal to the editor. The command processor's DELETE_WARNINGS operation is then just a Delete File command. However, this mechanism effectively involves a copy operation (the contents of the warning file are copied into the editor's memory), and it is this copy that I'm trying to avoid by having the editor and compiler in the same address space and allowing them to share the warning object structures. If we do put read and write locks on the individual warning objects, we have effectively implemented reference counts. The multiple-reader locking schemes I'm familiar with increment a counter or add an element to some list every time a reader grabs the lock, and do the opposite when unlocking, so that they will know when all the readers are gone. So, if your answer is that reference counts are the right solution, just say so. It's possibly the case that all well-designed applications can be implemented using either manual or ref-count-based deallocation. However, reference counts add a significant fixed overhead to every assignment of a ref-count-protected data type; what was probably a single instruction becomes: if target not null decrement target.refcount if target.refcount = 0 then deallocate target.all increment source.refcount target = source Most modern GC schemes have time overhead that is a function (often linear) of the frequency of allocation. Since assignments are always more frequent than allocations, and I suspect usually MUCH more frequent, this difference is important. Ada prevents many of the worst types of pointer problems by only allowing pointers to be created by the heap allocation primitive (NEW). In other languages, where pointers to existing objects may be created (C's "&", PL/I's "ADDR") things can get pretty bad without a GC. I ran into such a problem last year when I was trying to use a subroutine library originally designed for display of a database in a facility for editing that database. The database is implemented as a text file, with each line representing a record, and each field of the record separated by a delimiter character. The programs are in C, and the subroutine for reading the database into memory read each line into a newly allocated string, replaced each delimiter with a null character (C's string terminator), and allocated structures containing pointers to the first character of each field. This was fine for the read-only applications, as they could simply deallocate all the strings and record structures that were allocated (they didn't actually bother, since the programs run on Unix and they depended on everything being deallocated when the program terminated). I tried to write a program that reads in this database and then allows the user to edit fields. If the new field value is shorter than the original, I simply overwrote the original value; if not, I allocated a new string for the new value, and changed the record's field pointer to point there instead of into the middle of the original line. But when it came time to deallocate, I needed to know whether individual fields needed to be deallocated independently of the lines. Had I gotten around to finishing this program, I probably would have added a parallel data structure containing flags indicating whether each field had been reallocated. In general, what I see coming out of this is that manual deallocation may always be possible, but it frequently requires a non-trivial amount of extra coding to provide bookkeeping to let the program know when it is safe to deallocate things. Remember, every additional line of code in a program increases its complexity and is another potential failure point. And details such as storage management are generally uninteresting to the application programmer, who is probably more interested in the problem domain, so it is likely that he won't be as careful when designing, coding, and testing them (storage management is also difficult to test, unless you or the runtime library include extra mechanisms just to improve debuggability). Garbage collectors, on the other hand, are usually implemented by system programmers who find them interesting and can be expected to care enough to do the very best. And if you don't think these psychological factors are important, then you're working with programmers of a different species than I'm familiar with (that reminds me, I wonder where my "Psychology of Computer Programming" is -- probably in my parents' house, unfortunately, or I'd look up references), and I wouldn't want to work with them! In conclusion, I expect that I can manage storage as well as you can. And I could also manually convert a number to a character string and send these characters to an I/O device. Every major language provides runtime support for the latter so we don't all have to waste our time writing it. What's so special about storage management that we should all be burdened with it? Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar