Object Benchmark Tests

wiki

This is an evaluation of object access time for various approaches to object orientation in Lua. The primary concern was raw performance speed, not memory use or object creation time.

The Code

-- Benchmarking support.
do
  local function runbenchmark(name, code, count, ob)
    local f = loadstring([[
        local count,ob = ...
        local clock = os.clock
        local start = clock()
        for i=1,count do ]] .. code .. [[ end
        return clock() - start
    ]])
    io.write(f(count, ob), "\t", name, "\n")    
  end

  local nameof = {}
  local codeof = {}
  local tests  = {}
  function addbenchmark(name, code, ob)
    nameof[ob] = name
    codeof[ob] = code
    tests[#tests+1] = ob
  end
  function runbenchmarks(count)
    for _,ob in ipairs(tests) do
      runbenchmark(nameof[ob], codeof[ob], count, ob)
    end
  end
end

function makeob1()
  local self = {data = 0}
  function self:test()  self.data = self.data + 1  end
  return self
end
addbenchmark("Standard (solid)", "ob:test()", makeob1())

local ob2mt = {}
ob2mt.__index = ob2mt
function ob2mt:test()  self.data = self.data + 1  end
function makeob2()
  return setmetatable({data = 0}, ob2mt)
end
addbenchmark("Standard (metatable)", "ob:test()", makeob2())

function makeob3() 
  local self = {data = 0};
  function self.test()  self.data = self.data + 1 end
  return self
end
addbenchmark("Object using closures (PiL 16.4)", "ob.test()", makeob3())

function makeob4()
  local public = {}
  local data = 0
  function public.test()  data = data + 1 end
  function public.getdata()  return data end
  function public.setdata(d)  data = d end
  return public
end
addbenchmark("Object using closures (noself)", "ob.test()", makeob4())

addbenchmark("Direct Access", "ob.data = ob.data + 1", makeob1())

addbenchmark("Local Variable", "ob = ob + 1", 0)


runbenchmarks(select(1,...) or 100000000)

The Results (current version)

These are the results for the current version. All times are in user-mode CPU time in seconds (and sub-seconds if your OS supports it) for 100 million iterations (1e8, the default).

| 2010-01-15 MikePall
| Intel Core2 Duo E8400 3.00GHz
| Linux x86, GCC 4.3.3 (-O2 -fomit-frame-pointer for both Lua and LuaJIT)
| Lua 5.1.4 (lua objbench.lua) vs.
| LuaJIT 1.1.5 (luajit -O objbench.lua) vs.
| LuaJIT 2.0.0-beta2 (lj2 objbench.lua)

Lua     LJ1    LJ2
-----------------------------------------------------
14.08	2.16   0.1  Standard (solid)
14.92	4.62   0.1  Standard (metatable)
14.28	2.66   0.1  Object using closures (PiL 16.4)
 9.14	1.68   0.1  Object using closures (noself)
 7.30   1.10   0.1  Direct Access
 1.22	0.34   0.1  Local Variable

| 2008-04-16 MikePall
| Intel Core2 Duo E6420 2.13GHz
| Linux x86, GCC 4.1.2 (-O3 -fomit-frame-pointer for both Lua and LuaJIT)
| Lua 5.1.3 (lua objbench.lua) vs. LuaJIT 1.1.4 (luajit -O objbench.lua)

17.93   3.11   Standard (solid)
20.36   6.25   Standard (metatable)
19.34   3.73   Object using closures (PiL 16.4)
12.76   2.23   Object using closures (noself)
 7.53   1.55   Direct Access
 2.59   0.47   Local Variable

| 2008-04-17 LeonardoMaciel
| Intel Core 2 Duo T7200 2.00 GHz
| WinXP, MSVC9 (VS 2008)
| Lua 5.1.3 (using luavs.bat) vs. LuaJIT 1.1.4 (using luavs.bat)
| [NOTE: this measurement probably didn't use luajit -O]

17.52  10.78  Standard (solid)
19.74  12.55  Standard (metatable)
18.31  10.88  Object using closures (PiL 16.4)
14.20   5.09  Object using closures (noself)
 7.99   5.94  Direct Access
 1.70   0.41  Local Variable

| 2008-04-19 DougCurrie
| Pentium M 2.00 GHz
| WinXP, GCC 3.4.5 (mingw special)
| Lua 5.1.3 (from wxLua build) vs. LuaJIT 1.1.4 (luajit -O objbench.lua)

28.68   4.76  Standard (solid)
31.23   9.49  Standard (metatable)
30.32   5.38  Object using closures (PiL 16.4)
19.60   3.27  Object using closures (noself)
12.47   2.26  Direct Access
 2.72   0.51  Local Variable

<--- Add your results here if you like.
<--- Please indicate the date and your name or wiki page.
<--- Add your CPU, OS, compiler and Lua and/or LuaJIT version.

The Results (old version)

These are the results for the old version, missing the "Local Variable" test and measuring elapsed time in seconds.

Windows XP SP2  Intel P4 1.8a

Standard (solid)  Time: 34
Standard (metatable)  Time: 37
Object using closures (PiL 16.4)  Time: 40
Object using closures (noself)  Time: 29
Direct Access  Time: 19

Windows XP x64 SP1  AMD Athlon64 3500+ (64-bit Lua)

Standard (solid)  Time: 22
Standard (metatable)  Time: 23
Object using closures (PiL 16.4)  Time: 25
Object using closures (noself)  Time: 18
Direct Access  Time: 11

Windows Vista Ultimate(32bit), AMD Athlon X2 4200+ (Vanilla Lua 5.1.1 / LuaJIT 1.1.2)

Standard (solid)  Time: 26 / 11
Standard (metatable)  Time: 29 / 15
Object using closures (PiL 16.4)  Time: 30 / 12
Object using closures (noself)  Time: 20 / 6
Direct Access  Time: 13 / 8

Linux Xubuntu Feisty Fawn(32bit), Intel P4 Celeron 2.4ghz (Vanilla Lua 5.1.1)

Standard (solid)  Time: 34
Standard (metatable)  Time: 38
Object using closures (PiL 16.4)  Time: 40
Object using closures (noself)  Time: 25
Direct Access  Time: 20

Windows XP Prof. SP2, Intel PIII 500mhz (Vanilla Lua 5.1.1 / LuaJIT 1.1.2)

Standard (solid)  Time: 133 / 60
Standard (metatable)  Time: 146 / 76
Object using closures (PiL 16.4)  Time: 147 / 64
Object using closures (noself)  Time: 99 / 32
Direct Access  Time: 67 / 36

Conclusion

Direct Access to a local copy of a table is by far the fastest way to do things (as expected). This serves as a reference to the rest of them.

The noself method is the second fastest here. It relies strictly on closures and locals defined within that closure, returning nothing but a public interface. If the tests are modified to perform ten additions per method call, then it can exceed the speed of Direct Access since this reduces the overhead of the function call. The noself and PiL 16.4 methods are the only two that have any support for privately scoped variables.

Every other method is slower than standard direct access. The metatable method gums up the works even more with the extra lookup required but still has comparable speed to the PiL 16.4 method.

The method mentioned in PiL 16.4 adds a private scope advantage and uses a closure to store self, but this isn't nearly as fast as the optimization done in Lua for using a proper 'self'.

One last thing to note: An early version of the benchmarking code did not take a local reference to the objects (but indexed the global table). This affected everything by a few seconds, most of all the direct access--the time doubled.

Hope this helps -- AA

Please feel free to add your specs above if they would have some value. Please run them three times and use the average.

Object Benchmark Tests

The Code

The Results (current version)

The Results (old version)

Conclusion

See Also