lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Arseny Vakhrushev wrote:
> Ok, I looked through lj_alloc.c and realized that there is
> probably no way for LuaJIT to use more than 1Gb in 64-bit
> Linux...

Well, even if that sounds strange, but your best option is to
compile LuaJIT as 32 bit. 32 bit processes can use close to 4GB of
memory under a Linux/x64 kernel. And as I've previously explained,
there's little performance difference with LuaJIT on x86 vs. x64.

> Well, I have a clustered server system where LuaJIT handles
> high-level logic scripting. All objects stored and managed in
> the system are high-level as well. I can easily avoid the above
> problem by scaling the system - adding more daemons and
> connecting them through the loopback interface. However, that
> leads to unnecessary overhead and throws me out of "one machine
> -> one daemon" scheme.

A single process doesn't make good use of a multi-core CPU. And
unless you're very careful and use non-blocking I/O everywhere,
you'll hit I/O bottlenecks. Running 10-20 worker processes on a
quad-core is a common setup.

Also, it's not a good idea to store millions of objects occupying
several gigabytes in a single Lua state. The Lua garbage collector
is simply not up to the task (LuaJIT currently uses the same GC).
It's very, very inefficient for huge out-of-cache workloads. The
GC causes serious cache thrashing and this kills performance.

I've attached a simple test which allocates just enough objects to
stay below 1GB for LuaJIT. Here's the output on my (fast) machine:

[The numbers are much higher with plain Lua, whether x86 or x64.]

0.97	seconds allocation time with stopped GC
1.53	seconds for a full GC
0.93	seconds for a cleanup GC	

1.92	seconds allocation time with enabled GC
1.52	seconds for a full GC
0.99	seconds for a cleanup GC	

1.96	seconds allocation time with enabled GC
2.95	seconds for a full GC with randomized links
1.01	seconds for a cleanup GC with randomized links	

- A full GC takes 50% more time than the allocations themselves.
- If the GC is enabled, it doubles the allocation time.
- To simulate a real application, the links between objects are
  randomized in the third run. This doubles the GC time!

And that was just for 1GB! Now imagine using 8GB -- a full GC
cycle would keep the CPU busy for a whopping 24 seconds!

Ok, so the normal mode is to use the incremental GC. But this just
means the overhead is ~30% higher, it's mixed in between the
allocations and it will evict the CPU cache every time. Basically
your application will be dominated by the GC overhead and you'll
begin to wonder why it's slow ....

tl;dr version: Don't try this at home. And the GC needs a rewrite
(postponed to LuaJIT 2.1).

-- Allocate, traverse and collect lots of small objects.
-- Shows cache thrashing by the GC. Public domain.

local N = 14000000
local T

local function alloc(text)
  local t = {}
  local x = os.clock()
  for i=1,N do t[i] = {t} end
  print(os.clock()-x, "seconds allocation time"..text)
  T = t

local function collect(text)
  x = os.clock()
  print(os.clock()-x, "seconds for a full GC"..text)
  T = nil
  x = os.clock()
  print(os.clock()-x, "seconds for a cleanup GC"..text, "\n")

alloc(" with stopped GC")

alloc(" with enabled GC")

alloc(" with enabled GC")
local random = math.random
for i=1,N do T[i][1] = T[random(N)] end
collect(" with randomized links")