lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Sorry to reply my own post.
I just want to apologize that obviously I didn't know what I was
talking about in the last post.

I've updated the code to use bitop library, as Mike said in his post,
because the reason for 2D VLA slower than 1D VLA in my case is
probably about the conditionals.

https://github.com/archilifelin/mysandbox/blob/master/lua/lifegame_jit_ffi_2d.lua
The only change is around line 83~84. Now this code completes in 0.4 ~
0.5 seconds, and there is no dramatic worst case. (with conditional
rulesets this was about 1 second, with worst cast around 3 seconds)

This line:
new[y][x] = old[y][x] > 0 and rule1[count] or rule2[count]

will slow down the code much. While testing these scripts I got a
typo, and it becomes

new[y][x] = old[y][x] and rule1[count] or rule2[count]

Although the result is obviously wrong, however the speed is ok. I
don't know what's going on with that "greater than", since using FFI
VLA it will be sometimes slower than plain Lua table (as I reported in
the last post, the worst case.)

https://github.com/archilifelin/mysandbox/blob/master/lua/lifegame_jit_ffi.lua
The same change applied to 1D VLA too, it runs for 0.6 seconds or so.

So yes, n-dimensional arrays are faster than 1D-simulated ones as far
as FFI cdata is concerned. It looks better in coding-style as well. I
think this conclude my original question.

As a side note, I applied bitops to plain Lua table version of this
code and there's no visible performance gain, still runs like 1.5
seconds. I applied it to C++ version too, it's somewhat faster:
https://github.com/archilifelin/mysandbox/blob/master/cpp/lifegame1_auto_padding.cpp
The modified lines are around 58~59. Runs for 0.8 seconds or so with
gcc -O2, so LuaJIT FFI beats gcc -O2 in this case. : )

However I doubt gcc -O3 is beatable here, since with my case all the
allocations are fixed-arrays on stacks and constants... -O3 could
probably super aggressively optimize all of them. That's quite off
topic though, sorry for the noise : )

Sincerely,
Johnson Lin