[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Avoiding FFI- allocations + using SSE-vectors
- From: Josh Haberman <jhaberman@...>
- Date: Mon, 6 Feb 2012 15:48:31 -0800
On Mon, Feb 6, 2012 at 5:44 AM, David Given <email@example.com> wrote:
> In an earlier life I worked on an object-oriented operating system where
> all objects were reference counted.
> Never, never again.
> Not *only* does it have the problems described above, it's also
> incredibly brittle and easy to break. Forgetting to take a reference or
> drop a reference can lead to obscure and hard-to-find bugs.
There are techniques for dealing with this. If you add a void* to
your ref() and unref() functions indicating the owner of the ref, you
can pretty easily build ref-tracking that can determine who leaked
a reference (and forgetting to take a reference is pretty easy to
debug with Valgrind; it'll tell you you're accessing free'd memory
and where exactly it was freed). I implemented this literally
yesterday for a system I'm currently working on (modeled after
other API I'd used before) and the code reads nicely.
> And it makes
> the code verbose and difficult to read.
I find it makes ownership semantics clearer. Since unref() is
an explicit operation, it is clearer where your use and ownership
of the object ends. With garbage collection the ownership model
is much more haphazard (which is fine for high-level programming
like Lua, but less so IMO for systems programming in C or C++).
> *And* since dropping a reference
> can conceivably lead to the object being destroyed then and there, if it
> was the last reference, it means you have to be really careful when
> references get dropped.
The flip side is that destroying happens at a predictable time. Garbage
collection can also happen at unexpected times, but it is highly
non-deterministic, so something that works 99% of the time can fail
if GC happened to be initiated in a critical section.
Other downsides to GC: you have to maintain a list of root objects
and make sure that all roots are discoverable when the GC runs,
most GC algorithms stop all threads and have practically unbounded
latency for a full collection, GC can often require tuning for demanding
IMO the worst drawback of refcounting by far is that by itself it
cannot collect cyclic references.
A while ago I came across a paper with a fascinating premise, which
is essentially that GC and refcounting "asymptotically converge", to
use Mike words. I haven't taken the time to dive deeply into it, but
its abstract is:
Tracing and reference counting are uniformly viewed as
being fundamentally different approaches to garbage
collection that possess very distinct performance
properties. We have implemented high-performance
collectors of both types, and in the process observed that
the more we optimized them, the more similarly they
behaved - that they seem to share some deep structure.
We present a formulation of the two algorithms that shows
that they are in fact duals of each other. Intuitively,
the difference is that tracing operates on live objects,
or "matter", while reference counting operates on dead
objects, or "anti-matter". For every operation performed
by the tracing collector, there is a precisely
corresponding anti-operation performed by the reference
Using this framework, we show that all high-performance
collectors (for example, deferred reference counting and
generational collection) are in fact hybrids of tracing
and reference counting. We develop a uniform cost-model
for the collectors to quantify the trade-offs that result
from choosing different hybridizations of tracing and
reference counting. This allows the correct scheme to be
selected based on system performance requirements and the
expected properties of the target application.