From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on polar.synack.me X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham autolearn_force=no version=3.4.4 X-Google-Language: ENGLISH,ASCII-7-bit X-Google-Thread: 103376,3498dd887729ed19 X-Google-Attributes: gid103376,public From: Hans-Juergen Boehm Subject: Re: Garbage Collection in Ada Date: 1996/10/24 Message-ID: <32701284.167E@mti.sgi.com> X-Deja-AN: 191983664 references: <01bbb910$f1e73f60$829d6482@joy.ericsson.se> cc: boehm content-type: text/plain; charset=us-ascii organization: Silicon Graphics Inc., Mountain View, CA mime-version: 1.0 newsgroups: comp.lang.ada x-mailer: Mozilla 2.0S (X11; I; IRIX 6.2 IP22) Date: 1996-10-24T00:00:00+00:00 List-Id: There are many serious problems with this analysis, including the fact that it's blatantly inconsistent with any recent practical experience. I'll reply to the more theoretical ones here. Anyone who has any doubts about this is more than welcome to try the experiment. A suitable collector for C can br found in: http://reality.sgi.com/employees/boehm_mti/gc_source/gc.tar.gz. Robert I. Eachus wrote: > > In article <199610181934142408603@dialup101-3-15.swipnet.se>, > > Lars Farm wrote: > > >- How large is the probability that this occurs? > > > (Where "this" = "conservative GC fails to collect some garbage".) > > We can make some average case assumptions and see that the > probability of trouble in even moderate sized programs is pretty high. > The probability a block is marked as uncollectable is the probability > that any space already in the uncollectable set has something that > will be seen as a pointer to that block. Assuming (BIG assumption) > that all bit patterns are equally likely, and 32-bit pointers, the > probability that any word is seen as locking a block is > 1/2**32/(addresses per block). That is a big assumption. In fact on modern machines valid heap addresses generally occur rarely in other positions. Small integers are rarely valid heap addresses. Neither are floating point numbers or character strings. More importantly a good conservative collector will allocate around previously seen bogus pointers, reducing the probability substantially further. See my PLDI 93 paper for measurements. This is however not the main problem with the analysis. > If we use R to indicate the number of > words in the required (uncollectable) space not used for real > pointers, C as the number of addresses in the collectable space and B > as the (average) number of addresses per block, then the birthday > effect takes over when R*C*B approaches 2**32. It seems to me that either C should be expressed in objects, or B should disappear. A minor point. > Basically this means > that conservative GCs become useless when available memory times > required memory times block size is of the same order as the number of > different pointers. No. That is when you would start to see SOME memory accidentally retained if you ignored the above arguments. That doesn't make it unusable. See below. > > To take it one step further, if your program requires 16K of > non-pointer "random" data space on a 32-bit addressable machine, even > with small allocated objects, a "conservative" collector is so much > junk. (Uncollected garbage will on average lock down more uncollected > garbage, and the amount of uncollected garbage will grow without > bound.) NO! NO! And this is the crucial point. If you have 16K of junk and a 256K heap, unusually distributed data and a dump collector, you will see a bogus pointer. (In reality the constants are quite a bit bigger, though application dependent.) If that bogus pointer pins another 16K of junk, you'll lock down more memory, etc. etc. But bogus pointers raely pin down another 16K of junk. A random pointer into a balanced tree pins down something like 2 nodes on average. Wentworth published a paper about conservative collection in a fully utilized 16-bit address space a while back. It worked fine for many appllications and not for others. This is discussed more in my PLDI 93 paper. Applications written for conservative collection, even in C can easily avoid putting much "junk" into heap objects, further reducing this effect. It is normally easy to allocate a large fraction of the heap using pointer-free allocation primitives. For a language like Ada, it's easy to supply more detailed layout information for heap objects. That's cheap. Stack layout information is harder. > What about 64-bit machines? Now another factor jumps out to bite > you. Bit patterns in memory are not random. In particular, low > numbers are favored, and are more likely to correspond to actual > memory locations. No. 64-bit machines of which I'm aware allocate the heap starting at something like 2**32 by default. The collecting allocator could place it wherever it wants. > On a machine with a 64-bit address space, my guess > is that a conservative collector is useful up to a megabyte of > non-pointer live data. I've routinely run fairly long running applications with more than that on 32-bit machines. So have a number of other readers, I suspect. > > Now lets go with an absolute best case scenario for long lived > systems. The application has exactly one word of non-pointer storage > that is non-garbage, points to a valid heap address, and doesn't > change. Each dyanamically allocated object also contains one > non-pointer word and a pointer that is sometimes non-null. It is easy > to prove that over time more and more of the store will not be > available, and eventually you will run out of storage. THIS IS OBVIOUSLY FALSE. Consider the case in which each pointer points to a NIL object, and the junk in the accidentally referenced object and the NIL object points outside the heap area. (Our allocator would in fact allocate around the original nonchanging value. Unchanging values are easy.) > So > "conservative" collectors are of no use in applications where you want > to do your best to keep the application running. If you don't > believe the above, write the program--I'll make sure you have a good > enough RNG--and run it. Memory consumption (you measure it after GC > runs of course) will stay low initially, but eventually you will lock > in some garbage, then memory consumption rises rapidly. You probably > turn the program off before it crashes because you are spending most > of your time in garbage collection. Please try it for yourself. Until April I was reading my mail on a conservatively garbage collected system that couldn't possibly have stayed up for weeks by this chain of reasoning. (It did.) Many other systems that are in routine use couldn't possibly work either. -- Standard disclaimer ... Hans-Juergen Boehm boehm@mti.sgi.com