Re: Do LuaJit 2 support more than 2GB memory allocation on FreeBSD/x64 now?

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Do LuaJit 2 support more than 2GB memory allocation on FreeBSD/x64 now?
From: Mike Pall <mikelu-1204@...>
Date: Wed, 18 Apr 2012 19:12:48 +0200

Miles Bader wrote:
> Mike Pall <mikelu-1204@mike.de> writes:
> >> I just want to make sure that now LuaJit 2 whether surport more
> >> than 2GB memory allocation on FreeBSD/x64 or not.
> >
> > Rest assured, you do NOT want to allocate more than that amount of
> > Lua objects. The garbage collector in LuaJIT 2.0 (which is the
> > same as Lua 5.1) is simply not up to the task. Your program would
> > most certainly slow to a crawl with that many live objects.
> 
> Hmm, I've exceeded LuaJIT's memory limits, and haven't noticed any
> _obvious_ slowness as I approached the limit...

The main challenge for GC on modern CPUs is to avoid cache
thrashing for large datasets. There are several things to
consider:

* All objects in a garbage collected VM form a connected graph, so
  it makes sense to differentiate them by their graph theoretic
  aspects, which has a direct impact on their cache effect:

  0. value: no extra node
  1. box: zero fan-out
  2. link: low fan-out
  3. list: high fan-out

  All current versions of Lua and LuaJIT use variants of a
  tagged-value representation, which means class 0 is non-empty.
  Others, e.g. CPython don't use class 0 (every object reference
  is a pointer) or use it to a lesser extent. E.g. most VMs have
  to box FP numbers, whereas Lua has them 'inline' as a value.

  Lua/LuaJIT object types fall into the following categories:
  0: nil, boolean, number, lightuserdata
  1: string, cdata (LuaJIT only)
  2: upvalue, userdata, closure, prototype (medium fan-out)
  3: table (hash, array), thread (stack)

* The mark phase of a GC has to follow the graph from all GC roots,
  traverse all live objects and mark them. With that in mind, you
  can estimate the amount of work that needs to be done for each
  category. The following discussion assumes you know the category
  of each reference before following any pointer, which is true
  for a tagged-value representation.

  For category 0, you don't have to follow a pointer and there's
  nothing to mark. For category 1, you have to mark the object.
  You only have to follow the pointer if the mark is part of the
  object itself (true for Lua GC). There are other options for
  marking an object (separate GC metadata or card marking) which
  avoid the overhead of pointer following for category 1 objects.

  For category 2 and 3 you always have to follow the pointer and
  queue the outgoing references for later marking (shortcut
  possible for fan-out 1, e.g. upvalues). Queuing objects with a
  linked-list GC (Lua GC) is simple, but may be expensive for
  large category 3 objects. This may cause evictions and then all
  objects need to be loaded into the cache twice. Adding only the
  pointers themselves to a growable list is more cache-conscious,
  but probably only makes sense for an area-based GC (otherwise
  you're wasting the memory for the links).

  Summary: the overhead for a GC is highly dependent on the
  structure of the object graph, which can be estimated by the
  categories of its live objects.

* The overhead for all variants of the classic mark-sweep GC
  algorithm is dominated by the size of the live set of GC
  objects. Either by their number or by their size, depending on
  the GC design and the object category.

  The overhead of generational GCs is dominated by the size of the
  recently modified part of the live set. This only helps if that
  is substantially lower than the total size of the live set.
  Otherwise the extra overhead for keeping the generations may
  not pay off.

  The overhead of moving GCs is mostly dominated by the size of
  objects, independent of their category. OTOH non-moving GCs can
  mostly ignore category 1 objects.

  Linked-list GCs (such as Lua and LuaJIT up to 2.0) suffer more
  from a large number of small objects. The traversed links are
  often quite random wrt. memory addresses, which renders hardware
  cache prefetching ineffective. A segregated, area-based GC with
  separate GC metadata (plan for LuaJIT 2.1) can mitigate this
  effect without penalizing category 1 objects.

> Much of the memory use is in the form of humongous Lua strings (which
> then I pass as input to LPEG), but there's plenty of other stuff
> as... I don't think the strings count for more than about 1GB or so.
> [I'm guessing strings don't count so much because they're single large
> objects?]

Strings are category 1 objects, which means they only need to be
marked, but not traversed. The current Lua and LuaJIT VMs use a
non-moving, linked-list GC, so the size of strings is mostly
irrelevant as far as the GC is concerned.

Thus if your memory usage is dominated by huge strings, you're
unlikely to see the GC slowdown I described. OTOH if you had tons
of tables, taking up gigabytes of memory, you'd certainly notice.

--Mike

Follow-Ups:
- Re: Do LuaJit 2 support more than 2GB memory allocation on FreeBSD/x64 now?, Dimiter 'malkia' Stanev

References:
- Do LuaJit 2 support more than 2GB memory allocation on FreeBSD/x64 now?, clive
- Re: Do LuaJit 2 support more than 2GB memory allocation on FreeBSD/x64 now?, Mike Pall
- Re: Do LuaJit 2 support more than 2GB memory allocation on FreeBSD/x64 now?, Miles Bader

Prev by Date: Re: Lua 5.2 Length Operator and tables (bug?)
Next by Date: Re: Do LuaJit 2 support more than 2GB memory allocation on FreeBSD/x64 now?
Previous by thread: Re: Do LuaJit 2 support more than 2GB memory allocation on FreeBSD/x64 now?
Next by thread: Re: Do LuaJit 2 support more than 2GB memory allocation on FreeBSD/x64 now?
Index(es):
- Date
- Thread