lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Tue, Oct 28, 2014 at 05:31:57PM -0400, Rena wrote:
<snip>
> Someone mentioned that calling across the language boundary (C to Lua
> and vice-versa) is expensive? (I could never remember if that applies
> to Lua, LuaJIT, or both.) So it might be worth trying to reduce C
> calls?

At the end are the little scripts I threw together to benchmark C function
calls. For the built-in os.time, LuaJIT takes ~2/3 the execution time of Lua
5.1/5.2. But I really needed to crank up the loop unrolling. Fortunately
both Lua 5.1, 5.2, and LuaJIT did their best around 2^11 calls per loop,
although the difference was much more pronounced for PUC Lua.

For 2^27 calls and 2^11 calls per loop I got

  Lua5.1: 3.005s
  Lua5.2: 2.908s
  LuaJIT: 2.007s

I used os.time() because it's built-in, not a pure function, and figured it
was something LuaJIT couldn't optimize very well. On Linux time(2) does a
read from shared memory with the kernel so it has a negligible cost.

_However_, it turns out that LuaJIT apparently does optimize calls to its
built-ins. If I replace os.time with a C function

  static int inc(lua_State *L) {
    lua_Number i = lua_tonumber(L, 1);
    lua_pushnumber(L, i + 1);
    return 1;
  } /* inc() */

and replace the call-out in the script below with

  j = inc(j) --> inc is a local

then I got

  Lua5.1: 2.932s
  Lua5.2: 2.885s
  LuaJIT: 2.232s

which is only ~3/4 the execution time of Lua 5.1/5.2.

Out of curiosity I recompiled Lua 5.2 like so

   make MYCFLAGS="-D'luai_apicheck(...)=(void)0' -march=native -O3 -flto \
   -fwhole-program -fprofile-use=/tmp/gcov" linux

where I had first compiled with "-fprofile-path=/tmp/gcov -lgcov" and ran
the script once. For os.time I got

  Lua5.2: 2.621s

which is a nearly 10% improvement. (LTO, profiling, and a noop luai_apicheck
all contributed, but LTO the most. -O3 vs -O2 wasn't consistently
discernable, nor was -fomit-frame-pointer.)

I couldn't get the run time down for the inc() external function test, even
when I compiled that module with profiling. But at least for Lua built-ins
you can improve things by enabling compiler optimizations. You can always
build your modules into the interpreter, e.g. by compiling Lua to liblua.a
and then building 

NOTE: All benchmarks done on Ubuntu 14.04/x86_64 and, except where noted,
used the Ubuntu packaged compiler. I just ran each test twice in quick
succession, and took the user time as reported by the time(1) utility of the
second run. If the second run deviated substantially from the first or from
my pratice runs, I ran it twice again.

os.time test (time.lua):

  local ncalls_n, unroll_n = ...

  local function optnatural(n, d)
    n = tonumber(n)

    if n and n >= 0 then
      return math.floor(n)
    else
      return d
    end
  end

  local ncalls = 2^optnatural(ncalls_n, 27)
  local unroll = 2^optnatural(unroll_n, 11)
  local nloops = ncalls / unroll

  local code = string.format([[
    local n = ...
    local j = 0
    local time = os.time

    for i=1,n do
      %s
    end

    return j
  ]], string.rep("j = j + time()", unroll, "\n"))

  local f = assert((loadstring or load)(code))
  local j = f(nloops)

  print(string.format("j:%d ncalls:%d unroll:%d", j, ncalls, unroll))

external function test (inc.lua):

  local ncalls_n, unroll_n = ...
  
  local function optnatural(n, d)
  	n = tonumber(n)
  
  	if n and n >= 0 then
  		return math.floor(n)
  	else
  		return d
  	end
  end
  
  local ncalls = 2^optnatural(ncalls_n, 27)
  local unroll = 2^optnatural(unroll_n, 11)
  local nloops = ncalls / unroll
  
  local code = string.format([[
  	local n = ...
  	local j = 0
  	local ver = string.match(_VERSION, "(%%d%%.%%d)"):gsub("%%.", "")
  	local inc = require(string.format("inc%%s", ver))
  
  	for i=1,n do
  		%s
  	end
  
  	return j
  ]], string.rep("j = inc(j)", unroll, "\n"))
  
  local f = assert((loadstring or load)(code))
  local j = f(nloops)
  
  print(string.format("j:%d ncalls:%d unroll:%d nloops:%d", j, ncalls, unroll, nloops))

external function test (inc.c):

  // cc -o inc{51,52}.so inc.c -O3 -fPIC -shared -I/usr/include/lua{5.1,5.2}
  #include <lua.h>
  
  static int inc(lua_State *L) {
  	lua_Number i = lua_tonumber(L, 1);
  	// NOTE: calling lua_pop(L, 1) gives worse numbers in all VMs
  	lua_pushnumber(L, i + 1);
  	return 1;
  } /* inc() */
  
  int luaopen_inc(lua_State *L) {
  	lua_pushcfunction(L, &inc);
  	return 1;
  }
  
  int luaopen_inc51(lua_State *L) {
  	return luaopen_inc(L);
  }
  
  int luaopen_inc52(lua_State *L) {
  	return luaopen_inc(L);
  }
  
  int luaopen_inc53(lua_State *L) {
  	return luaopen_inc(L);
  }