GC by tracking

lua-l archive
[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]
Subject: GC by tracking
From: skaller <skaller@...>
Date: 08 Jan 2005 13:35:52 +1100
On Sat, 2005-01-08 at 04:38, Mark Hamburg wrote:

> If this all just seems like rambling or I'm wildly off track, feel free to
> tell me to sleep more before trying to study the Lua GC.

LOL!

I have a feeling you are trying too hard: you're trying to calculate
when to do what and by how much.. 

It's impossible and unnecessary -- you should be much lazier.

I'm very serious about this, my explanation below is a long
post -- and only addresses major collections, however the
principle applies generally.


There are two distinct problems.

(a) a program that uses more and more memory and then terminates.
(b) a long running program

Problem (a) is very hard: some problems are limited by memory
and other need to be done fast. Because allocation of reachable
objects is very rapid and follows an indeterminate pattern, 
there is no logical way to decide when to collect.

A good algorithm is: don't collect until you run out of
memory. This is the best possible algorithm. It wastes no
time collecting when no collection is needed, and always
collects when memory is required so it meets both possible
design criteria: it is simultaneously as fast as possible
whilst allowing the largest possible problem to run.

So we have *solved* the problem for all type (a) programs.
In fact these are of little interest despite being the 
most common class of program. An example is a compiler.
Such programs are not real time so it just doesn't matter.
Peak memory use just isn't important. You can of course
slightly adjust the algorithm to be 'more reasonable',
but really it does not matter.

So only class (b) programs are of interest. Long running.

All such programs are necessarily periodic for the simple
reason that computers have bounded  memory, and there are
real time bounds on how fast the program must respond to
stimuli (latency) to be useful, and how much input it
must handle (bandwidth).

So, given this picture, I will show how to do major collections.
You should note immediately there is only one question here:

"what is the period between major collections?"

I will assume the program is *clocked* somehow. The clock
can be a real timer, it can be the act of allocating memory,
or, in the case of the Lua collector in work4, some random
points in the code where the designer decided to put a tick.
(These are the calls to the collector).

To begin we must schedule major collections. This is the
only reliable way to recover garbage, it recovers
all of it, and, most importantly -- it can count how
much garbage it recovered.

We will *ignore* generational and incremental behaviour
for the moment .. it will turn out to be irrelevant
to major collections.

We will *shedule* major collections like this: there is a variable
somewhere called 'time-to-next-collection' and every tick of
the clock we subtract one from it. When it goes to zero,
we do a major collection.

So -- please note there is only one problem to solve:
what is the timeout to the next collection? It is a single
number, there is nothing complicated.

The simplest solution is

	timeout = period


This just runs the major collection every N ticks for some
arbitrary constant N. This is a very good solution and 
one of the most common. You just run your program, measure
how much garbage is collected each collection, and you can
then adjust N up or down by a simple factor to limit garbage
to a desired fraction of memory at the expense of using up
more CPU. The crudest assumption is that the relationship
is linear, and that assumption is always correct when
you're fine tuning.

The main problem with this solution is that it isn't 
adaptive. It doesn't change as the load changes.

How can we do that? Well, there are exactly three variables:

	G -- garbage
	M -- memory before allocation
	R = M - G -- memory after allocation

We must express our goal using only these variables. For example:

	G/M < 0.1

is a goal, where we wish to limit garbage to 10% of memory.

How do we achieve this? It is like driving  a car.
You don't aim exactly at one place. You have an idea
where you want to go, aim roughly in the right direction,
and then observe when you're drifting off course and make
a correction. That is -- the process is one of using negative
feedback.

Let's rewrite our goal as:

	G = 0.1 M

Now we can *measure* how close we are to achieving it:

	E = G(desired) - G(actual)

We do NOT know exactly how to use E to precisely recalculate
the period of the collection. But we certainly know that
if E is positive, we are collecting too fast (not enough garbage)
and if it is negative we are collecting too slowly (too much
garbage).

So clearly, if we add to the period "something like" a fraction
of E, we will fix the problem. In particular calculate

	D = f(E)

where f is some monotone function, and now:

	period = period + D

is used to adjust the collection rate.

The point of this analysis is to show that:

"The solution is independent of all other factors:
it makes no difference at all if you are doing some
incremental or generational collection, if the
load on the program varies over time, what the
program is actually calculating. etc"

The algorithm is simple, universal, and it is also
basically the only possible general solution.

The point is that the whole problem is reduced
to chosing the function f. Nothing else matters.

A common solution is a polynomial:

	f(x) = c0 + c1 * x + c2 * x^2 + ..

and typically a quadratic is enough. 

As I mentioned in another post, THIS ALWAYS WORKS
with even half way sensible values for the coefficients.

So now the whole problem is reduced to 3 numbers:

	c0, c1, c2

and you can just put them in a Lua table and have the
collector fetch them to reshedule itself.

We have fixed the algorithm completely now. 
There is no more code to write.
It is all up to the values c0, c1, c2 which can
be set by the user. The only remaining
problem is to provide sensible defaults.
We could try:

	c0 = 1 
	c1 = 0.1 
	c2 = 0.01

[This basically means it will take 10 periods to adjust
to a slow change.]

Just to review:

Q: how does generational or incremental collection
affect the sheduling of major collections?

A: it is entirely transparent. If some other
process such as an additional incremental
collection occurs, which cuts down the
amount of garbage, then the error term E
will reflect that and the major collections
will automatically shedule with a longer period.

Q: what happens if the load increase,
for example in a telephony system, people don't make
calls when they're asleep .. so load on an exchange
depends on time of day. But also there might be
an international emergency which greatly increases
the load..

A1: the collector will lag behind for a while,
but if necessary the quadratic term will increase
its rate of adjustment until it runs at a frequency
appropriate to the load.

A2: nothing -- if your clock is memory allocations,
then increasing the rate of memory allocations 
in real time is irrelevant.

Q: why don't we adjust exactly and immediately to
an error?

A: if you are 1 degree off course, would you suddenly
point your car wheels at 90% to the direction of motion
to fix the problem? No, you make a small correction.
If that isn't enough, you make bigger ones.

Q: can it fail?

A: of course it can fail. No computer can solve a problem
requiring X memory if it has less than X memory,
and you can't respond to stimuli at rate R if your
CPU is slower than R.

Certainly, if you choose bad values for the coefficients,
your program may crash and burn before you can adjust.

You can try different coeffients. However your
program can still lose track if there is an unexpected
demand. There is no way around this.

On Apollo 11, just as the Lunar Module was coming in
to land on the moon .. the altimeter cut out!

The heavy load on the instruments caused the software
to dump certain programs to free up resources..
and the altimeter isn't very useful whilst orbiting,
it had a low priority.

Woops! They forgot the priorities changed during
the landing .. the software was patched 'on the fly' :)

Q: is there any mathematics to calculate the c0, c1, and c2
optimially?

A: Yes. It's called a Z-transform. However it isn't
really relevant here. Z-transforms are used to optimise
tracking of phenomena with well known behavioural
characteristics, such as physical phenomena.

Garbage collection doesn't fit into that class.
In fact, I built the worlds first digital dimmer
which uses a software phase locked loop, and I can
tell you it is very hard to get the right coefficients
even for rigidly controlled signals such as mains
power frequency.

The bottom line is that the coefficients can only
be determined by trial and error, and the best
tool around is a human brain together with a graph
of the variables.


-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net
Follow-Ups:
- Re: GC by tracking, Mark Hamburg
References:
- Re: GC [was Re: Lua 5.1 (work4) now available], Mark Hamburg
Prev by Date: select - work4
Next by Date: Re: FW: srlua extension to run with glued zip file
Previous by thread: Re: GC [was Re: Lua 5.1 (work4) now available]
Next by thread: Re: GC by tracking
Index(es):
- Date
- Thread