lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


A generational collector is almost certainly going to use more memory since
the partial collections won't get all the garbage. On the other hand, a
generational collector will recycle some memory faster which is potentially
a good thing. The issue is how to avoid the behavior exhibited in work4 that
Mike has so nicely documented.

Here are my thoughts this morning, now slightly more rested and and
caffeinated than last night.

In the non-generational approach, the GC allocates memory up to a threshold
(calculated as twice the total memory at the end of the previous collection)
and then kicks in with a fairly aggressive incremental GC. This is an
interesting variation on incremental collection since a conventional
incremental collector is always running. It probably reduces the number of
mutator/collector conflicts at the expense of making the VM performance
essentially bimodal between times when collection is occurring and times
when collection isn't occurring. The most recent graphs show that this works
pretty well.

What we'd like to do in a generational system is implement a lighter weight
collection that we could run more frequently and thereby delay the
heavy-weight collection.

Though I don't think it strictly need to do so the current partial
collection does all of its marking atomically. (It could probably mark the
gray objects incrementally before going into the atomic phase.) Hence, we
don't want to run the partial collection too often.

So, here is my proposed approach. I'm fuzzy on how estimate works, so I'm
going to express this in terms of total bytes allocated.

1. At the end of a partial collection, if g->totalbytes is greater than
twice (or pick the factor of your choice) the value of g->totalbytes at the
end of the previous full collection, start an incremental full collection
immediately.

2. At the end of a full collection, capture g->totalbytes and set the
threshold for triggering what will be a partial collection to twice this
number.

3. At the end of a partial collection in which g->totalbytes doesn't exceed
the threshold to start a full collection, set the threshold to do another
partial collection at g->totalbytes + the value of g->totalbytes at the end
of the previous full collection.

The effect of this is that only full collections increase the amount of
memory we are prepared to go through before starting the next cycle and if a
partial collection doesn't recover enough we don't wait to do a full
collection.

The worst case memory consumption on this -- ignoring the overshoot from
incremental collection -- is three times the baseline as opposed to the
non-generational approach which settles in at two times. If we were prepared
to make the thresholds for partial collections smaller, we can keep that
number down closer to two.

What this doesn't deal with is a way to have the collector driven forward by
passing step values into collectgarbage. Small steps need to push us toward
partial collection. Large steps need to push us toward full collection. But
I need to study the code a bit more to have a good answer for how to make
that work.

The other optimizations to deal with involve making the partial collection
as fast as possible, tenuring as little data as possible, and making the
incremental collection payoff as much as possible.

1. As fast as possible: If there isn't a separate list of objects in the
nursery to sweep, there probably ought to be. Nothing else is going to end
up unmarked at the end of a partial collection. I didn't see something that
looked like this in the code.

2. Tenuring as little as possible: I believe the only thing that has to be
tenured to preserve invariants is the data reachable from other tenured
objects and this will all be reachable via the gray objects at the start of
a partial cycle. So, with the probably addition of another mark bit, it
seems like one could tenure the objects that need to be tenured and simply
mark the other accessible objects in a partial GC.

3. Getting as much benefit as possible from a partial collection: One thing
that might be interesting to do in the sweep phase of a partial collection
would be to put the memory blocks into a pool rather than freeing them and
use the pool as the first place to turn when allocating memory. (Large
blocks should probably still be freed.) A full GC would actually free the
memory. The benefit of this is that we reduce trips to malloc and free.

If this all just seems like rambling or I'm wildly off track, feel free to
tell me to sleep more before trying to study the Lua GC.

Mark

P.S. I said last night that I was studying work3. Actually, it's work2 that
I have integrated right now. work4 integration probably comes in a week or
so. If I get time after that, I may try some of these tweaks myself and I'll
report back.