lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Mon, Nov 17, 2008 at 3:38 PM, Peter Cawley <lua@corsix.org> wrote:
> It may be worth looking at the generated Lua opcodes for these benchmarks in
> order to easier see the differences in what is happening in each. For
> example, return true v.s return nil are loadbool,return vs. loadnil,return.
> Then looking at the VM code for these operations, either in C or as the
> assembled output of the C, might make it clearer. Of course, this won't help
> with explaining the luajit results, as it skips the VM when JITing.

Sorry for the late reply.

Opcode listing (via luac -l -l) is indeed very helpful. Chaining calls
use less resources, since they do not require extra MOVE opcodes:

local function chain_local()
  local chain = chain
  chain () () () () () () () () () () -- 10 calls
end

function <chaincallbench2.lua:9,12> (13 instructions, 52 bytes at 0x100fb0)
0 params, 2 slots, 1 upvalue, 1 local, 0 constants, 0 functions
	1	[10]	GETUPVAL 	0 0	; chain
	2	[11]	MOVE     	1 0
	3	[11]	CALL     	1 1 2
	4	[11]	CALL     	1 1 2
	5	[11]	CALL     	1 1 2
	6	[11]	CALL     	1 1 2
	7	[11]	CALL     	1 1 2
	8	[11]	CALL     	1 1 2
	9	[11]	CALL     	1 1 2
	10	[11]	CALL     	1 1 2
	11	[11]	CALL     	1 1 2
	12	[11]	CALL     	1 1 1
	13	[12]	RETURN   	0 1

Whereas plain_local and plain_chain_local both require MOVEs to get
function to call:

local function plain_local()
  local plain = plain
  plain ()
  ...
  plain () -- 10 calls
end

local function plain_chain_local()
  local chain = chain
  chain ()
  ...
  chain () -- 10 calls
end

function <chaincallbench2.lua:14,26> (22 instructions, 88 bytes at 0x101190)
0 params, 2 slots, 1 upvalue, 1 local, 0 constants, 0 functions
	1	[15]	GETUPVAL 	0 0	; plain
	2	[16]	MOVE     	1 0
	3	[16]	CALL     	1 1 1
	4	[17]	MOVE     	1 0
	5	[17]	CALL     	1 1 1
	6	[18]	MOVE     	1 0
	7	[18]	CALL     	1 1 1
	8	[19]	MOVE     	1 0
	9	[19]	CALL     	1 1 1
	10	[20]	MOVE     	1 0
	11	[20]	CALL     	1 1 1
	12	[21]	MOVE     	1 0
	13	[21]	CALL     	1 1 1
	14	[22]	MOVE     	1 0
	15	[22]	CALL     	1 1 1
	16	[23]	MOVE     	1 0
	17	[23]	CALL     	1 1 1
	18	[24]	MOVE     	1 0
	19	[24]	CALL     	1 1 1
	20	[25]	MOVE     	1 0
	21	[25]	CALL     	1 1 1
	22	[26]	RETURN   	0 1

function <chaincallbench2.lua:28,40> (22 instructions, 88 bytes at 0x101460)
0 params, 2 slots, 1 upvalue, 1 local, 0 constants, 0 functions
	1	[29]	GETUPVAL 	0 0	; chain
	2	[30]	MOVE     	1 0
	3	[30]	CALL     	1 1 1
	4	[31]	MOVE     	1 0
	5	[31]	CALL     	1 1 1
	6	[32]	MOVE     	1 0
	7	[32]	CALL     	1 1 1
	8	[33]	MOVE     	1 0
	9	[33]	CALL     	1 1 1
	10	[34]	MOVE     	1 0
	11	[34]	CALL     	1 1 1
	12	[35]	MOVE     	1 0
	13	[35]	CALL     	1 1 1
	14	[36]	MOVE     	1 0
	15	[36]	CALL     	1 1 1
	16	[37]	MOVE     	1 0
	17	[37]	CALL     	1 1 1
	18	[38]	MOVE     	1 0
	19	[38]	CALL     	1 1 1
	20	[39]	MOVE     	1 0
	21	[39]	CALL     	1 1 1
	22	[40]	RETURN   	0 1

Note that in versions without upvalue caching MOVE is replaced with
GETUPVAL. From a quick look to Lua code, MOVE *looks* a bit faster due
to less lookups:

      case OP_MOVE: {
        setobjs2s(L, ra, RB(i));
        continue;
      }
      case OP_GETUPVAL: {
        int b = GETARG_B(i);
        setobj2s(L, ra, cl->upvals[b]->v);
        continue;
      }

Still, the difference is in tenths of microseconds, and it looks like
both of my benchmark runs were with too little iterations to be
trusted (seconds in total time)...

Alexander.