I'm currently optimising an iteration wrapper that I have written for Lupa [1]. It basically maps Python's iterator protocol to the one in Lua, so that Lua programs can efficiently iterate over Python objects.

To this end, I need to keep a reference to the Python iterator around, which I store as userdata in the Lua iterator state object (-> iterator, state, control variable) so that Lua garbage collects the Python iterator when the state object goes out of scope. To get the reference back at each iteration step, I use luaL_checkudata().

Now, the callgrind profiler tells me that almost 25% of the time to advance the iterator is spent in the call to luaL_checkudata(). From previous experience, I learned to doubt this number, as valgrind tends to overestimate the impact of memory allocations. However, the problem is that this function actually needs to create a new string in order to do the user data name check at all. That seems to account for almost half of the runtime, plus another half which is spent in other places of that function. So this really appears to be a bottleneck.

From looking at the function in LuaJIT2 (2.0.0-beta5), it basically only calls internal functions, so it's not clear to me what I can do to avoid the call.

Any ideas what I could try to improve this?


[1] Lupa: