• Subject: Re: Avoiding LuaJIT FFI allocations (Was: LuaJIT - Is ffi.alloca possible?)
• Date: Sun, 5 Feb 2012 22:10:16 +0100

```Another experiment is on making madd function that takes 2x16 arguments and returns 16 results shows that LuaJIT working on stack only is super fast ("done in 0.104428 seconds.") and produces clear and short assembly code, but the source code is really obscure.

If only I could achieve the same results as in 'margmadd.lua' below using simple operator driven syntax:
for i = 1, 20000000 do
A = A + B
end

I believe however these would need to be implemented low level, for example via some GCC vectors handling (declared with vector_size), that can be already metatypes, but are read only (raising "attempt to write to constant location" upon write), nor define any operators (raising "attempt to perform arithmetic on 'float __attribute__((vector_size(64)))' and 'float __attribute__((vector_size(64)))'" when trying add two of these).

a12, a22, a32, a42,
a13, a23, a33, a43,
a14, a24, a34, a44,

b11, b21, b31, b41,
b12, b22, b32, b42,
b13, b23, b33, b43,
b14, b24, b34, b44)
return a11 + b11, a21 + b21, a31 + b31, a41 + b41,
a12 + b12, a22 + b22, a32 + b32, a42 + b42,
a13 + b13, a23 + b23, a33 + b33, a43 + b43,
a14 + b14, a24 + b24, a34 + b34, a44 + b44
end

local a11, a21, a31, a41 = 0, 0, 0, 0
local a12, a22, a32, a42 = 0, 0, 0, 0
local a13, a23, a33, a43 = 0, 0, 0, 0
local a14, a24, a34, a44 = 0, 0, 0, 0

local b11, b21, b31, b41 = 1, 2, 3, 4
local b12, b22, b32, b42 = .1, .2, .3, .4
local b13, b23, b33, b43 = -1, -2, -3, -4
local b14, b24, b34, b44 = 1.1, 1.2, 1.3, 1.4

local start = os.clock()
for i = 1, 20000000 do
a11, a21, a31, a41,
a12, a22, a32, a42,
a13, a23, a33, a43,
a14, a24, a34, a44 = mat4add(a11, a21, a31, a41,
a12, a22, a32, a42,
a13, a23, a33, a43,
a14, a24, a34, a44,
b11, b21, b31, b41,
b12, b22, b32, b42,
b13, b23, b33, b43,
b14, b24, b34, b44)
end
print(string.format('done in %f seconds', os.clock()-start))
print(a11, a21, a31, a41)
print(a12, a22, a32, a42)
print(a13, a23, a33, a43)
print(a14, a24, a34, a44)

Cheers,
--