From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,3498dd887729ed19 X-Google-Attributes: gid103376,public From: eachus@spectre.mitre.org (Robert I. Eachus) Subject: Re: Garbage Collection in Ada Date: 1996/10/24 Message-ID: X-Deja-AN: 191934026 references: <01bbb910$f1e73f60$829d6482@joy.ericsson.se> organization: The Mitre Corp., Bedford, MA. newsgroups: comp.lang.ada Date: 1996-10-24T00:00:00+00:00 List-Id: (This post started yesterday, and I added a section replying to Richard A. O'Keefe's comments on the Interlisp-D compiler at the end.) > In article <199610181934142408603@dialup101-3-15.swipnet.se>, > Lars Farm wrote: > >- How large is the probability that this occurs? > (Where "this" = "conservative GC fails to collect some garbage".) > I don't see any way of assigning a numeric probability to it. That's > the main thing that makes *me* nervous about it. I mean, there's > nothing under the programmer's control that can let you know if or when > this sort of bug might happen. We can make some average case assumptions and see that the probability of trouble in even moderate sized programs is pretty high. The probability a block is marked as uncollectable is the probability that any space already in the uncollectable set has something that will be seen as a pointer to that block. Assuming (BIG assumption) that all bit patterns are equally likely, and 32-bit pointers, the probability that any word is seen as locking a block is 1/2**32/(addresses per block). If we use R to indicate the number of words in the required (uncollectable) space not used for real pointers, C as the number of addresses in the collectable space and B as the (average) number of addresses per block, then the birthday effect takes over when R*C*B approaches 2**32. Basically this means that conservative GCs become useless when available memory times required memory times block size is of the same order as the number of different pointers. To take it one step further, if your program requires 16K of non-pointer "random" data space on a 32-bit addressable machine, even with small allocated objects, a "conservative" collector is so much junk. (Uncollected garbage will on average lock down more uncollected garbage, and the amount of uncollected garbage will grow without bound.) What about 64-bit machines? Now another factor jumps out to bite you. Bit patterns in memory are not random. In particular, low numbers are favored, and are more likely to correspond to actual memory locations. On a machine with a 64-bit address space, my guess is that a conservative collector is useful up to a megabyte of non-pointer live data. Now lets go with an absolute best case scenario for long lived systems. The application has exactly one word of non-pointer storage that is non-garbage, points to a valid heap address, and doesn't change. Each dyanamically allocated object also contains one non-pointer word and a pointer that is sometimes non-null. It is easy to prove that over time more and more of the store will not be available, and eventually you will run out of storage. So "conservative" collectors are of no use in applications where you want to do your best to keep the application running. If you don't believe the above, write the program--I'll make sure you have a good enough RNG--and run it. Memory consumption (you measure it after GC runs of course) will stay low initially, but eventually you will lock in some garbage, then memory consumption rises rapidly. You probably turn the program off before it crashes because you are spending most of your time in garbage collection. This lesson was learned and learned well in the early days of LISP implementations. Systems attempting to use conservative collectors eventually died. I remember one incident on a PDP-1 at MIT where this happened in under an hour. (The machine had 16K of 18-bit memory, and a large--for 1964--drum that could be directly addressed. The LISP implementation stored all objects on the drum, using if I remember correctly 22-bit pointers, but the drum had less than a million words.) In article <54mrhq$bqn$1@goanna.cs.rmit.EDU.AU> ok@goanna.cs.rmit.EDU.AU (Richard A. O'Keefe) writes: > A conserative garbage collector was an intrinsic part of > Interlisp-D. All references from the heap were precise, but the > stack was scanned conservatively (in order to make the processing > of unboxed floats cheap). The Quintus Prolog garbage collector > had a similar problem with floats, and took the conservative > route. This does not have the same abysmal characteristics. As long as the heap and stack are treated differently, the worst case behavior is that you expect the amount of junk locked to depend on the number of non-pointer values on the stack (and the size of heap structures), not how long the program has been running. This particular "semi-conservative" GC style fell out of favor becuase you had to munge records copied from the stack to the heap, and vice-versa and special case offsets on where the object was currently located. The better solution was to always store and GC) records and arrays the same way no matter where they are located (usually starting with a one-word pointer to a GC thunk), add a counter to the stack frame and put all pointers at the head of the data section. The GC code turns out to be very simple, you can compact if you wish, and all you lose is one word per heap object and two words per complex object on the stack. Robert I. Eachus with Standard_Disclaimer; use Standard_Disclaimer; function Message (Text: in Clever_Ideas) return Better_Ideas is... -- Robert I. Eachus with Standard_Disclaimer; use Standard_Disclaimer; function Message (Text: in Clever_Ideas) return Better_Ideas is...