lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Mon, Feb 6, 2012 at 5:44 AM, David Given <dg@cowlark.com> wrote:
> In an earlier life I worked on an object-oriented operating system where
> all objects were reference counted.
>
> Never, never again.
>
> Not *only* does it have the problems described above, it's also
> incredibly brittle and easy to break. Forgetting to take a reference or
> drop a reference can lead to obscure and hard-to-find bugs.

There are techniques for dealing with this.  If you add a void* to
your ref() and unref() functions indicating the owner of the ref, you
can pretty easily build ref-tracking that can determine who leaked
a reference (and forgetting to take a reference is pretty easy to
debug with Valgrind; it'll tell you you're accessing free'd memory
and where exactly it was freed).  I implemented this literally
yesterday for a system I'm currently working on (modeled after
other API I'd used before) and the code reads nicely.

> And it makes
> the code verbose and difficult to read.

I find it makes ownership semantics clearer.  Since unref() is
an explicit operation, it is clearer where your use and ownership
of the object ends.  With garbage collection the ownership model
is much more haphazard (which is fine for high-level programming
like Lua, but less so IMO for systems programming in C or C++).

> *And* since dropping a reference
> can conceivably lead to the object being destroyed then and there, if it
> was the last reference, it means you have to be really careful when
> references get dropped.

The flip side is that destroying happens at a predictable time.  Garbage
collection can also happen at unexpected times, but it is highly
non-deterministic, so something that works 99% of the time can fail
if GC happened to be initiated in a critical section.

Other downsides to GC: you have to maintain a list of root objects
and make sure that all roots are discoverable when the GC runs,
most GC algorithms stop all threads and have practically unbounded
latency for a full collection, GC can often require tuning for demanding
work-loads.

IMO the worst drawback of refcounting by far is that by itself it
cannot collect cyclic references.

A while ago I came across a paper with a fascinating premise, which
is essentially that GC and refcounting "asymptotically converge", to
use Mike words.  I haven't taken the time to dive deeply into it, but
its abstract is:

  Tracing and reference counting are uniformly viewed as
  being fundamentally different approaches to garbage
  collection that possess very distinct performance
  properties. We have implemented high-performance
  collectors of both types, and in the process observed that
  the more we optimized them, the more similarly they
  behaved - that they seem to share some deep structure.

  We present a formulation of the two algorithms that shows
  that they are in fact duals of each other. Intuitively,
  the difference is that tracing operates on live objects,
  or "matter", while reference counting operates on dead
  objects, or "anti-matter". For every operation performed
  by the tracing collector, there is a precisely
  corresponding anti-operation performed by the reference
  counting collector.

  Using this framework, we show that all high-performance
  collectors (for example, deferred reference counting and
  generational collection) are in fact hybrids of tracing
  and reference counting. We develop a uniform cost-model
  for the collectors to quantify the trade-offs that result
  from choosing different hybridizations of tracing and
  reference counting. This allows the correct scheme to be
  selected based on system performance requirements and the
  expected properties of the target application.

Josh