lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Leo Razoumov wrote:
> I would like to give it a try. I implemented complex numbers as
> userdata for the Lua interpreter. But userdata is allocated on the
> heap and, thus, is too slow for tight loops commonly found in
> numerics.
> Bringing down box/unbox overhead could save the day.

Yes, the JIT compiler ought to be able to remove the overhead in
tight loops. It might not work too well for branchy loops, though.

> Also I am a bit worried about function dispatch. Adding two
> doubles is a native Lua opcode and it does not go through the
> trouble of metamethods. Using __add, __mul, etc metamethods
> dispatch for complex numbers is slow. Could it be avoided?

That's an issue for the interpreter, yes. But the JIT compiler
treats metamethod dispatch like any other table lookup. It's
usually able to disambiguate it, to hoist it and so on.

[If the complex data type were to be defined as a special kind of
userdata, the JIT compiler could shortcut the dispatch even under
more difficult circumstances.]

To see the metamethod dispatch hoisting, try this program:

  local t = {}
  for i=1,100 do t[i] = tostring(i) end
  local x = 0
  for i=1,100 do x = x + t[i]:len() end
  print(x)

The dispatch in the second loop first involves a lookup of the
"__index" table in the string metatable. Then "len" is looked up
in this table and the resulting function (string.len) is called.

Ok, so run it with:

  luajit -jdump=im test.lua

Here's the loop part of the second trace:

->LOOP:
f7f21e20  cmp edi, ecx                      // Array bounds check
f7f21e22  jnb 0xf7f1a010	->2
f7f21e28  cmp dword [eax+edi*8+0x4], -0x05  // Type check for array load
f7f21e2d  jnz 0xf7f1a010	->2
f7f21e33  mov esi, [eax+edi*8]              // temp1 = t[i]
f7f21e36  xorps xmm6, xmm6
f7f21e39  cvtsi2sd xmm6, [esi+0xc]          // temp2 = #temp1
f7f21e3e  addsd xmm7, xmm6                  // x = x + temp2
f7f21e42  add edi, +0x01
f7f21e45  cmp edi, +0x64
f7f21e48  jle 0xf7f21e20	->LOOP
f7f21e4a  jmp 0xf7f1a014	->3

Pretty short, eh? As you can see, all dispatch has been hoisted.

--Mike