[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Avoiding LuaJIT FFI allocations (Was: LuaJIT - Is ffi.alloca possible?)
- From: Adam Strzelecki <ono@...>
- Date: Sat, 4 Feb 2012 00:34:51 +0100
Hello,
I have problem with LuaJIT FFI allocations in my OpenGL LuaJIT FFI framework. Similar to described in "LuaJIT - Is ffi.alloca possible?" thread from last year.
I got "mat4" (GLSL mat4 equivalent) type implemented as FFI metatype "struct { GLfloat m11 … m44; }". Everything works fine, however when I want to draw many objects, each having different model matrix, I need to pre-calculate:
shader.modelView = view * model
shader.modelViewProjection = projection * modelView
These two call mat4MT.__mul function that calls internally mat4() (ffi.new) to create results. Unfortunately allocation takes most of the time here, all other calculations are negligible in comparison to this allocation.
After these are pre-calculated shaderMT.__newindex loads them to OpenGL using UniformMatrix4fv, which requires me to call ffi.cast(GLfloatp, matrix), as otherwise FFI complains about incompatible argument. So again it seems to do another allocation & copy there. After I send these to OpenGL I do not store these values anywhere, so they are discarded in my program.
Is there any gentle way to avoid these allocations? If I disable this pre-calculation I get around ~1000FPS instead of 40 in my program.
Would allocation sinking that is planned for LuaJIT help in this case? In C++ I would use some classes allocated on stack, so no need for heap allocator.
Just to demonstrate that FFI allocation is two orders of magnitude slower than simple operations on locals:
local mat4 = ffi.typeof('float[16]')
local test
local start = os.clock()
for i = 1, 20000000 do
test = mat4(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
end
print(string.format('allocation took %f seconds', os.clock()-start))
local t1, t2, t3, t4, t5, t6, t7, t8, t9, t10, t11, t12, t13, t14, t15 = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
local start = os.clock()
for i = 1, 20000000 do
t1, t2, t3, t4, t5, t4, t7, t8, t9, t10, t11, t12, t13, t14, t15 = t1 + 1, t2 + 2, t3 + 3, t4 + 4, t5 + 5, t6 + 6, t7 + 7, t8 + 8, t9 + 9, t10 + 10, t11 + 11, t12 + 12, t13 + 13, t14 + 14, t15 + 15
end
print(string.format('assignment took %f seconds', os.clock()-start))
Best regards,
--
Adam Strzelecki