lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

2011/2/23 Francesco Abbate <>:
> luajit2/git HEAD / array impl / 0m0.296s
> luajit2/git HEAD / unroll impl / 0m0.109s
> luajit2/ beta6 / array impl / 0m10.860s
> luajit2/ beta6 / unroll impl / 0m0.109s
> C (GSL) / C opt(*) / array impl / 0m0.206s
> (*) CFLAGS="-O2 -march=native -mfpmath=sse"
> The difference between git HEAD and 2.0-beta6 is huge (~ 100x)
> (compiled trace vs interpreted code I guess). Could you tell us more
> about what you have done in LuaJIT2 ?

Mike, sorry to bother you again but I've made a very small change in
the code and I get again poor results (10.86sec instead of 0.296sec)

What I've done is to not explicitely unroll the last loop to calculate
the error, you will find the modified code in attachment.

The reason to not explicitely unroll it is that in the dimension of
the ODE system is very big it is better to avoid to unroll a huge
number of lines in the code. For the other side I'm not able for the
moment to vectorize the code because it does involve arithmetic on
absolute values which is outside of standard BLAS operations.

For me there is something problematic there because the addition of a
one small loop can complerely spoil the results in term of execution
speed. Do you think this problem could be eventually fixed in LuaJIT2

For the moment the only idea that I have for this problem is to write
a C routine that execute the specific operation that I need but this
is a sort of defeat... we want to write everything in Lua+FFI :-)


Description: Binary data

Attachment: rkf45vec-v2-out.lua
Description: application/binary