lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


> Also there's something weird in this example, if I set iter=20000000 (20 million) then it needs below <0.5 second to complete whole loop, when GCC compiled example using SSE intrinsics needs over 2 seconds to do the same and intrinsics free implementation needs over 2.2 seconds.

I am sorry, this sentence is irrelevant as I've tried that with GCC 4.6.2 instead of Apple's Xcode 4.2 GCC-LLVM/Clang and SSE intrinsics code from: http://fhtr.blogspot.com/2010/02/4x4-float-matrix-multiplication-using.html does 20M iterations in ~0.25 sec and regular non-SSE code in 0.5 sec, which is on par with LuaJIT implementation (also 0.5 sec) from previous mail. (C benchmark is available at: https://github.com/nanoant/ssebench)

Still the LuaJIT JIT error exists as described in previous mail - a11 is cleared to 0 when JIT kick in. But there's nothing wrong with performance of LuaJIT but Xcode's 4.2 Clang (which produces 8x slower code than GCC 4.6.2 and also crashes with -O>1!).

Regards,
-- 
Adam