[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: How to correctly manage a multi-dimension array with LuaJIT FFI ?
- From: Johnson Lin <arch.jslin@...>
- Date: Thu, 10 Feb 2011 06:15:12 +0800
On Thu, Feb 10, 2011 at 2:46 AM, Tony Finch <email@example.com> wrote:
> On Wed, 9 Feb 2011, Mike Pall wrote:
>> - I'm sure there are many more tricks to speed up the Game of
>> Life. There's plenty of code for this out there, take a look.
> See for example
Thank you Mike, for all the optimization tips, I really learned a lot.
And thanks for the game of life examples too, Tony :)
However I was more interested in find out if I mis-used LuaJIT FFI
related API, so I come up with another series of tests, with some tips
Mike gave me in these modifications. There's still quite a few
techniques I just don't want to apply right now, because mostly I want
to figure out if I actually have some 'escaped references' in the
2-dimensional VLA test I did before, so most ugly and stupid parts of
my code stayed the same as before.
Here's the code:
Mainly I suspect the neighbor_count function is the biggest threshold,
but I have no idea where that escaped reference might be... so I just
simply rewrite the neighbor counting method using what Mike said, and
added a padding wrap function (looks bad I know)..... and then, WHOA,
that's already like 10 TIMES faster. So I made the grid bigger, but
still deliberately don't use a power-of-2 size. Otherwise there will
be too much factor contribute to the performance gain and I cannot be
sure what's what.
However the question remains. I didn't try the 1D array version with
this modification, so:
... and I found out that it's even faster, though just a little bit.
However the 2D version above sometimes has dramatic worse cases in
that it will suddenly perform like 3x slower than other test runs. I
noticed that on these runs, the automaton were still active (maybe a
few sliders or it's still morphing without a specific pattern) ...
although it didn't make sense... did it? the 1D version has worse
cases too, but not that dramatic, and is generally a little faster by
Lastly I made similar modifications to the C++ version too:
With gcc (4.5.1) -O3 it's roughly 3x faster than the general case of
both LuaJIT FFI versions. With -O2, then sometimes LuaJIT FFI version
could beat the C++ version. (When I still used a 2D array to store the
neighbor direction and wrap using conditionals, the binaries compiled
with -O2 and -O3 perform at the same speed.)
And again the original pure Lua version, with the same modification:
is roughly 50~60% slower than the FFI version.
ps. I am using a i7-720QM (1.6GHz at least per core) equipped laptop
test results ( 22x22 sized grid, 100000 iterations ) :
with gcc 4.5.1 -O3 roughly 0.25 seconds
with gcc 4.5.1 -O2 roughly 1 seconds
LuaJIT FFI using 2D VLA roughly 1 seconds, with worst case around 3 seconds
LuaJIT FFI using 1D VLA roughly 0.8 seconds, with worst case around 1.5 seconds
plain Lua ver. run with LuaJIT roughly 1.6 seconds
plain Lua ver. run with Lua5.1.4 roughly 70 seconds (It took too
long, I run only once)
That's about it :) I'll try harder to polish my programming skill :p