lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

This is great information, and I've learned a bit more about GC.

Mike, would it be possible to include some kind of stats (or would that be wasteful in CPU resources) - for example count table - how many times of each type, or each group have been found to be still alive.

something like

print( gcstat["table"] ) -> 5000 tables
print( gcstat["thread"] ) -> 100 threads

or b type

print( gcstat[3] ) -> 5100 (threads + tables).

>    0: nil, boolean, number, lightuserdata
>    1: string, cdata (LuaJIT only)
>    2: upvalue, userdata, closure, prototype (medium fan-out)
>    3: table (hash, array), thread (stack)

Even some heuristic would be good enough start?

This would definitely change my way of doing thing (I'm dealing with lots of geometry - meshes, vertices, and I think I'll stick my guns to use "C" structures and arrays for data that is cooked, and lua structures for live/editing of it - and functions to transfer from one to another).


On 4/18/2012 10:12 AM, Mike Pall wrote:
Miles Bader wrote:
Mike Pall<>  writes:
I just want to make sure that now LuaJit 2 whether surport more
than 2GB memory allocation on FreeBSD/x64 or not.

Rest assured, you do NOT want to allocate more than that amount of
Lua objects. The garbage collector in LuaJIT 2.0 (which is the
same as Lua 5.1) is simply not up to the task. Your program would
most certainly slow to a crawl with that many live objects.

Hmm, I've exceeded LuaJIT's memory limits, and haven't noticed any
_obvious_ slowness as I approached the limit...

The main challenge for GC on modern CPUs is to avoid cache
thrashing for large datasets. There are several things to

* All objects in a garbage collected VM form a connected graph, so
   it makes sense to differentiate them by their graph theoretic
   aspects, which has a direct impact on their cache effect:

   0. value: no extra node
   1. box: zero fan-out
   2. link: low fan-out
   3. list: high fan-out

   All current versions of Lua and LuaJIT use variants of a
   tagged-value representation, which means class 0 is non-empty.
   Others, e.g. CPython don't use class 0 (every object reference
   is a pointer) or use it to a lesser extent. E.g. most VMs have
   to box FP numbers, whereas Lua has them 'inline' as a value.

   Lua/LuaJIT object types fall into the following categories:
   0: nil, boolean, number, lightuserdata
   1: string, cdata (LuaJIT only)
   2: upvalue, userdata, closure, prototype (medium fan-out)
   3: table (hash, array), thread (stack)

* The mark phase of a GC has to follow the graph from all GC roots,
   traverse all live objects and mark them. With that in mind, you
   can estimate the amount of work that needs to be done for each
   category. The following discussion assumes you know the category
   of each reference before following any pointer, which is true
   for a tagged-value representation.

   For category 0, you don't have to follow a pointer and there's
   nothing to mark. For category 1, you have to mark the object.
   You only have to follow the pointer if the mark is part of the
   object itself (true for Lua GC). There are other options for
   marking an object (separate GC metadata or card marking) which
   avoid the overhead of pointer following for category 1 objects.

   For category 2 and 3 you always have to follow the pointer and
   queue the outgoing references for later marking (shortcut
   possible for fan-out 1, e.g. upvalues). Queuing objects with a
   linked-list GC (Lua GC) is simple, but may be expensive for
   large category 3 objects. This may cause evictions and then all
   objects need to be loaded into the cache twice. Adding only the
   pointers themselves to a growable list is more cache-conscious,
   but probably only makes sense for an area-based GC (otherwise
   you're wasting the memory for the links).

   Summary: the overhead for a GC is highly dependent on the
   structure of the object graph, which can be estimated by the
   categories of its live objects.

* The overhead for all variants of the classic mark-sweep GC
   algorithm is dominated by the size of the live set of GC
   objects. Either by their number or by their size, depending on
   the GC design and the object category.

   The overhead of generational GCs is dominated by the size of the
   recently modified part of the live set. This only helps if that
   is substantially lower than the total size of the live set.
   Otherwise the extra overhead for keeping the generations may
   not pay off.

   The overhead of moving GCs is mostly dominated by the size of
   objects, independent of their category. OTOH non-moving GCs can
   mostly ignore category 1 objects.

   Linked-list GCs (such as Lua and LuaJIT up to 2.0) suffer more
   from a large number of small objects. The traversed links are
   often quite random wrt. memory addresses, which renders hardware
   cache prefetching ineffective. A segregated, area-based GC with
   separate GC metadata (plan for LuaJIT 2.1) can mitigate this
   effect without penalizing category 1 objects.

Much of the memory use is in the form of humongous Lua strings (which
then I pass as input to LPEG), but there's plenty of other stuff
as... I don't think the strings count for more than about 1GB or so.
[I'm guessing strings don't count so much because they're single large

Strings are category 1 objects, which means they only need to be
marked, but not traversed. The current Lua and LuaJIT VMs use a
non-moving, linked-list GC, so the size of strings is mostly
irrelevant as far as the GC is concerned.

Thus if your memory usage is dominated by huge strings, you're
unlikely to see the GC slowdown I described. OTOH if you had tons
of tables, taking up gigabytes of memory, you'd certainly notice.