lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I've been trying Lua 5.1(work) with our game, and I love the new
incremental GC.  It doesn't appear to be aggressive enough for our use
in stock form, however.  For our application (a console game), we need
to keep the memory high-water mark as low as possible, without
sacrificing too much performance.

After some experimenting, I see a couple possible solutions.  One is to
increase the per-step limit in luaC_step:

void luaC_step (lua_State *L) {
  global_State *g = G(L);
  l_mem lim = (g->nblocks - (g->GCthreshold - GCSTEPSIZE)) * 2;
  do {
    lim = singlestep(L, lim);
    if (g->gcstate == GCSfinalize && g->tmudata == NULL)
      break;  /* do not start new collection */
  } while (lim > 0);
  g->GCthreshold = g->nblocks + GCSTEPSIZE - lim/2;
  lua_assert((long)g->nblocks + (long)GCSTEPSIZE >= lim/2);
}

>From my analysis, the "* 2" factor seems to be the controlling element.
Increasing GCSTEPSIZE also increases the limit, but decreases the
frequency at which the GC is run.  AFAICT, the factor of 2 has been
chosen more or less arbitrarily; increasing it to 8 (while also changing
the "/2" factors to "/8" in the last two lines) has a very noticeable
impact on the high-water mark in our game.  My question to the Lua
authors is: is the factor of 2 "special" in any way?

[My speculative reasoning why the value might be 2: if you assume that
singlestep() collects lim bytes (it doesn't, but let's assume it does),
then each run through luaC_step() decreases the GCthreshold by
GCSTEPSIZE bytes (and decreases nblocks by 2*GCSTEPSIZE bytes).  If
instead of 2, we have a factor alpha, then each step would decrease the
threshold by (alpha-1)*GCSTEPSIZE bytes (and decrease nblocks by
alpha*GCSTEPSIZE bytes).]

So that's one way to increase the aggressiveness of the GC.  The other
way would be to run the GC a fixed number of steps each frame.  This is
attractive because the cost is predictable, and based on some quick
benchmarking, it looks like we can run the GC at an average consumption
rate that well exceeds our average generation rate without incurring a
significant performance hit (this is great news!).

Unfortunately the current API doesn't support this, but I made a stab at
adding this functionality to lua_gc():

    case LUA_GCSTEP: {
      int i;
      lua_lock(L);
      for( i=0; i<data; i++ ) {
        if( g->nblocks < g->GCthreshold )
          g->GCthreshold = g->nblocks;
        luaC_step(L);
      }
      lua_unlock(L);
      return 0;
    }

I'd appreciate any other suggestions at how we might tweak the GC to
suit our purposes.

Thanks,
Jasmin