  lua-l archive

• Subject: Re: fastest way to maintain nested tables of counters?
• From: "E. Toernig" <froese@...>
• Date: Sun, 15 May 2011 19:25:13 +0200

```A boring sunday and I was curious about the performance of the
different implementations, especially regarding the differences
between Lua and LuaJIT.

A quick summary - explanation further down:
-- ORIG ---- lua 7.23s, luajit 0.24s/0.62s ---
-- OPT ----- lua 2.91s, luajit 0.14s/0.32s ---
-- NAIVE --- lua 2.03s, luajit 3.83s/21.34s --- (no typo!)
-- TUNED --- lua 1.84s, luajit 0.15s/1.21s ---

I noticed that LuaJIT is sometimes MUCH slower if the test loop is
slightly modified (Lua shows no difference) so there are two numbers
for LuaJIT, the first for this test loop:

|-- x y n z
|for x=1,100 do
|  for y=1,100 do
|    for n=1,10 do
|      for z=1,20 do
|        inc3(c,x,y,z)
|end end end end

and the second for this one:

|-- n x y z
|for n=1,10 do
|  for x=1,100 do
|    for y=1,100 do
|      for z=1,20 do
|        inc3(c,x,y,z)
|end end end end

To the original question ...

Tim Menzies wrote:
>
> is the following code the best way to handle auto-initialization of nested
> table cells to zero?

|--\[[-- ORIG ---- lua 7.23s, luajit 0.24s/0.62s ---
|local function inc3(c,x,y,z)
|    c[x] = c[x] or {}
|    c[x][y] = c[x][y] or {}
|    c[x][y][z] = c[x][y][z] or 0
|    c[x][y][z] = c[x][y][z] + 1
|end
|local c = {}
|--]]----------------------------------------

Best depends.  But it's definitely not the fastest ;-)
You do a lot of redundant table lookups and they are
code: 12 to 30 times faster then Lua.

An optiomized version would be:

|--\[[-- OPT ----- lua 2.91s, luajit 0.14s/0.32s ---
|local function inc3(c,x,y,z)
|    local b = c[x]  if b==nil then b={} c[x] = b end
|    local a = b[y]  if a==nil then a={} b[y] = a end
|    a[z] = (a[z] or 0) + 1
|end
|local c = {}
|--]]----------------------------------------

Table lookups/stores are minimized.  And it shows:
about two times faster then the original version -
both for Lua and LuaJIT.

> oh-  i've read the automagical table entry [...] and
> i cant quite justify to myself that that approach is
> preferred to the above.

Well, autotables move the nil-tests to the C side.  You'll
get a performance that is similar to pre-populated tables.

And, IMO more important, you don't have to perform these
tests for every use.  You don't have to write functions
like inc3 (see next paragraph).

The call "inc3(c,x,y,z)" in the test-loop is replaced by
the inline code:

|   local t=c[x][y]  t[z]=t[z]+1

First a naive implementation of autotables:

|--\[[-- NAIVE --- lua 2.03s, luajit 3.83s/21.34s ---
|local function zero() return 0 end
|local function auto_tab(a,b,c,d,e)
|    return setmetatable({}, {
|        __index = function(t,k)
|            local v = a(b,c,d,e)
|            t[k] = v
|            return v
|        end
|    })
|end
|local c = auto_tab(auto_tab, auto_tab, zero)
|--]]----------------------------------------

It's expensive on the memory side (each table has its own
metatable and a unique index-function with upvalues) but
Lua can handle it quite well, about 30% faster then the
optimized version though most of this speedup is due to
the inlined increment code (one less function call).

But what's LuaJIT doing?  Especially with the alternative
test loop?  21s?!?  Ten times slower than Lua!

Ok, time for a tuned "unrolled" auto-table version:

|--\[[-- TUNED --- lua 1.84s, luajit 0.15s/1.21s ---
|local auto_0, auto_t0, auto_tt0
|local meta_0   = { __index = function(t,k) local v=0         t[k]=v return v end }
|local meta_t0  = { __index = function(t,k) local v=auto_0()  t[k]=v return v end }
|local meta_tt0 = { __index = function(t,k) local v=auto_t0() t[k]=v return v end }
|function auto_0()   return setmetatable({}, meta_0) end
|function auto_t0()  return setmetatable({}, meta_t0) end
|function auto_tt0() return setmetatable({}, meta_tt0) end
|local c = auto_tt0()
|--]]----------------------------------------

This version is only slightly faster (1.84s vs 2.03s) but it

LuaJIT is back on track, about the same runtime as the optimized
non-auto-table version (impressive) but it still chokes on the
alternative test-loop (8 times slower but still faster than
Lua).

> but i'd like the advice of some Lua gurus!

Well, the usual advice: don't optimize too early ;-)

I.e.: For 2 million increments you can save at max 7.2seconds.
If the total runtime to generated these 2 million increments
is 10 minutes, you save 1.2% runtime - not worth looking at.

Ciao, ET.

Full test code attached ...
```
```--[[-- ORIG ---- lua 7.23s, luajit 0.24s/0.62s ---
local function inc3(c,x,y,z)
c[x] = c[x] or {}
c[x][y] = c[x][y] or {}
c[x][y][z] = c[x][y][z] or 0
c[x][y][z] = c[x][y][z] + 1
end
local c = {}
--]]----------------------------------------

--[[-- OPT ----- lua 2.91s, luajit 0.14s/0.32s ---
local function inc3(t,x,y,z)
local a = t[x]  if a==nil then a={} t[x] = a end
local b = a[y]  if b==nil then b={} a[y] = b end
b[z] = (b[z] or 0) + 1
end
local c = {}
--]]----------------------------------------

--[[-- NAIVE --- lua 2.03s, luajit 3.83s/21.34s ---
local function zero() return 0 end
local function auto_tab(a,b,c,d,e)
return setmetatable({}, {
__index = function(t,k)
local v = a(b,c,d,e)
t[k] = v
return v
end
})
end
local c = auto_tab(auto_tab, auto_tab, zero)
--]]----------------------------------------

--\[[-- TUNED --- lua 1.84s, luajit 0.15s/1.21s ---
local auto_0, auto_t0, auto_tt0
local meta_0   = { __index = function(t,k) local v=0         t[k]=v return v end }
local meta_t0  = { __index = function(t,k) local v=auto_0()  t[k]=v return v end }
local meta_tt0 = { __index = function(t,k) local v=auto_t0() t[k]=v return v end }
function auto_0()   return setmetatable({}, meta_0) end
function auto_t0()  return setmetatable({}, meta_t0) end
function auto_tt0() return setmetatable({}, meta_tt0) end
local c = auto_tt0()
--]]----------------------------------------

if inc3 then
for x=1,100 do
for y=1,100 do
for n=1,10 do
for z=1,20 do
inc3(c,x,y,z)
end
end
end
end
else
for x=1,100 do
for y=1,100 do
for n=1,10 do
for z=1,20 do
local t=c[x][y] t[z]=t[z]+1
end
end
end
end
end
```