lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I took a different approach, where instead of relying on luajit returning me fresh new scalars or vectors, I was providing them instead, which means I had to make sure these are preallocated in some way:

	https://github.com/malkia/ufo/blob/master/lib/v3math.lua

It can also return "new" values, but because these are expensive, these functions were suffixed with "new" (to be more obvious, and require more typing), like mulnew(a, b) would return a new vector3, while mul(result, a, b) would put it back in the result.

I was satisfied with the results, because there was no memory allocation, and even the assembly looked good (but then Mike came and said - NOT GOOD ENOUGH :) :) :) :) - but to me... juuuust fine!)

	http://article.gmane.org/gmane.comp.lang.lua.general/81280

	Here is some more on the topic:

	http://www.reddit.com/r/programming/comments/iup0m/beautiful_assembly_luajit/

On 2/3/2012 3:34 PM, Adam Strzelecki wrote:
Hello,

I have problem with LuaJIT FFI allocations in my OpenGL LuaJIT FFI framework. Similar to described in "LuaJIT - Is ffi.alloca possible?" thread from last year.

I got "mat4" (GLSL mat4 equivalent) type implemented as FFI metatype "struct { GLfloat m11 … m44; }". Everything works fine, however when I want to draw many objects, each having different model matrix, I need to pre-calculate:

   shader.modelView = view * model
   shader.modelViewProjection = projection * modelView

These two call mat4MT.__mul function that calls internally mat4() (ffi.new) to create results. Unfortunately allocation takes most of the time here, all other calculations are negligible in comparison to this allocation.

After these are pre-calculated shaderMT.__newindex loads them to OpenGL using UniformMatrix4fv, which requires me to call ffi.cast(GLfloatp, matrix), as otherwise FFI complains about incompatible argument. So again it seems to do another allocation&  copy there. After I send these to OpenGL I do not store these values anywhere, so they are discarded in my program.

Is there any gentle way to avoid these allocations? If I disable this pre-calculation I get around ~1000FPS instead of 40 in my program.

Would allocation sinking that is planned for LuaJIT help in this case? In C++ I would use some classes allocated on stack, so no need for heap allocator.

Just to demonstrate that FFI allocation is two orders of magnitude slower than simple operations on locals:

local mat4 = ffi.typeof('float[16]')

local test
local start  = os.clock()
for i = 1, 20000000 do
   test = mat4(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
end
print(string.format('allocation took %f seconds', os.clock()-start))

local t1, t2, t3, t4, t5, t6, t7, t8, t9, t10, t11, t12, t13, t14, t15 = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
local start  = os.clock()
for i = 1, 20000000 do
   t1, t2, t3, t4, t5, t4, t7, t8, t9, t10, t11, t12, t13, t14, t15 = t1 + 1, t2 + 2, t3 + 3, t4 + 4, t5 + 5, t6 + 6, t7 + 7, t8 + 8, t9 + 9, t10 + 10, t11 + 11, t12 + 12, t13 + 13, t14 + 14, t15 + 15
end
print(string.format('assignment took %f seconds', os.clock()-start))

Best regards,