From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me
X-Spam-Level: 
X-Spam-Status: No, score=0.7 required=5.0 tests=BAYES_00,INVALID_DATE,
	REPLYTO_WITHOUT_TO_CC autolearn=no autolearn_force=no version=3.4.4
Path: utzoo!utgpu!watmath!clyde!ima!think!barmar
From: barmar@think.COM (Barry Margolin)
Newsgroups: comp.lang.ada
Subject: Re: Garbage Collection
Message-ID: <35328@think.UUCP>
Date: 11 Jan 89 09:22:46 GMT
References: <35300@think.UUCP> <4054@hubcap.UUCP>
Sender: news@think.UUCP
Reply-To: barmar@kulla.think.com.UUCP (Barry Margolin)
Organization: Thinking Machines Corporation, Cambridge MA, USA
List-Id: <comp.lang.ada>

In article <4054@hubcap.UUCP> billwolf@hubcap.clemson.edu writes:
>    Let's assume the existence of a suitably implemented warning container,
>    which has been appropriately hardened for use as a shared variable.
>    Our compiler proceeds by read-locking the file and write-locking the
>    warning structure; having acquired the locks, the warning structure
>    for that particular file is updated, and the locks are then released.
>    Our editor proceeds by write-locking the file and read-locking the
>    warning structure, updating the file, and then releasing all locks.
>    Our warning destroyer proceeds by write-locking and then destroying
>    the specified warning structure(s).
>
>    That was almost too easy.  Did I misunderstand the problem?

First of all, whether the file is locked is immaterial to the
discussion (I never actually said that the compiler compiles files --
in Lisp the compiler can also be invoked on in-core interpreted
functions, and many Lisp programming environments allow in-core editor
buffers to be compiled).

However, I did imagine locking of the COMPILER_WARNINGS database.  I
expected that it would be read-locked during the operation of the
SELECT operation, and write-locked during ADD and DELETE.  But once
SELECT returns, the lock would be released.  The lock serves to
protect the collection structure (either an array or linked list,
probably), but not the individual elements (since there are no
operations that modify an existing warning object).

So, I envision that the editor proceeds by read-locking the warnings
structure, selecting the appropriate warnings, and then releasing the
lock.  It then displays the selected warnings to the user and allows
him to perform operations such as "go to line referenced in next
warning".

In many traditional systems the warnings database is a text file
containing the output of the compiler, and the SELECT operation works
by reading this file and parsing it into a set of warnings structures
internal to the editor.  The command processor's DELETE_WARNINGS
operation is then just a Delete File command.  However, this mechanism
effectively involves a copy operation (the contents of the warning
file are copied into the editor's memory), and it is this copy that
I'm trying to avoid by having the editor and compiler in the same
address space and allowing them to share the warning object
structures.

If we do put read and write locks on the individual warning objects,
we have effectively implemented reference counts.  The multiple-reader
locking schemes I'm familiar with increment a counter or add an
element to some list every time a reader grabs the lock, and do the
opposite when unlocking, so that they will know when all the readers
are gone.  So, if your answer is that reference counts are the right
solution, just say so.  It's possibly the case that all well-designed
applications can be implemented using either manual or ref-count-based
deallocation.  However, reference counts add a significant fixed
overhead to every assignment of a ref-count-protected data type; what
was probably a single instruction becomes:
	if target not null
	   decrement target.refcount
	   if target.refcount = 0 then deallocate target.all
	increment source.refcount
	target = source

Most modern GC schemes have time overhead that is a function (often
linear) of the frequency of allocation.  Since assignments are
always more frequent than allocations, and I suspect usually MUCH more
frequent, this difference is important.

Ada prevents many of the worst types of pointer problems by only
allowing pointers to be created by the heap allocation primitive
(NEW).  In other languages, where pointers to existing objects may be
created (C's "&", PL/I's "ADDR") things can get pretty bad without a
GC.  I ran into such a problem last year when I was trying to use a
subroutine library originally designed for display of a database in a
facility for editing that database.  The database is implemented as a
text file, with each line representing a record, and each field of the
record separated by a delimiter character.  The programs are in C, and
the subroutine for reading the database into memory read each line
into a newly allocated string, replaced each delimiter with a null
character (C's string terminator), and allocated structures containing
pointers to the first character of each field.  This was fine for the
read-only applications, as they could simply deallocate all the
strings and record structures that were allocated (they didn't
actually bother, since the programs run on Unix and they depended on
everything being deallocated when the program terminated).  I tried to
write a program that reads in this database and then allows the user
to edit fields.  If the new field value is shorter than the original,
I simply overwrote the original value; if not, I allocated a new
string for the new value, and changed the record's field pointer to
point there instead of into the middle of the original line.  But when
it came time to deallocate, I needed to know whether individual fields
needed to be deallocated independently of the lines.  Had I gotten
around to finishing this program, I probably would have added a
parallel data structure containing flags indicating whether each field
had been reallocated.

In general, what I see coming out of this is that manual deallocation
may always be possible, but it frequently requires a non-trivial
amount of extra coding to provide bookkeeping to let the program know
when it is safe to deallocate things.  Remember, every additional line
of code in a program increases its complexity and is another potential
failure point.  And details such as storage management are generally
uninteresting to the application programmer, who is probably more
interested in the problem domain, so it is likely that he won't be as
careful when designing, coding, and testing them (storage management
is also difficult to test, unless you or the runtime library include
extra mechanisms just to improve debuggability).  Garbage collectors,
on the other hand, are usually implemented by system programmers who
find them interesting and can be expected to care enough to do the
very best.  And if you don't think these psychological factors are
important, then you're working with programmers of a different species
than I'm familiar with (that reminds me, I wonder where my "Psychology
of Computer Programming" is -- probably in my parents' house,
unfortunately, or I'd look up references), and I wouldn't want to work
with them!

In conclusion, I expect that I can manage storage as well as you can.
And I could also manually convert a number to a character string and
send these characters to an I/O device.  Every major language provides
runtime support for the latter so we don't all have to waste our time
writing it.  What's so special about storage management that we
should all be burdened with it?

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar