lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi,

Wim Couwenberg wrote:
> Work 6 performs *much* worse than work 5 (and 5.0.2 even!) on the test 
> script below (About 70% (!) slower for sumi and about 30% for sumr.) 
> What's going on?

I have a much smaller script that shows the problem:

local n=1e8      -- change this to get a reasonable delay
--local dummy
--local dummy2
for i=1,n do end

Time this, then remove the comment from one, then from both dummy
variables. If you get vastly different timings, then this is an
alignment issue with the Lua stack on x86. If not, then I really
don't know (did you compile 51w5 and 51w6 with the same options?).

The Lua stack elements are 12 bytes in size on x86, because the
ABI does not require for doubles to be aligned to 8 bytes.
But there is a performance hit for unaligned access to doubles.

And 'for i=star,stop,step' deals with three stack slots containing
doubles. If the outer two are unaligned, you pay a larger penalty
than if the inner one is unaligned.

Interestingly it only shows under certain circumstances. It seems
to be very sensitive to compiler options, register allocations
and the like. As far as I could analyze this, sometimes the
unaligned access penalty is lost in the FP pipeline, sometimes
it shows. It happens more often with 51w6, though (I use GCC 3.3.4).

Adding __attribute__ ((__aligned__ (8))) to the 'Value' union
(in lobject.h) is a Q&D solution to the problem. But of course
your Lua stacks need 33% more space then (only a problem if
you use lots of coroutines I guess).

Bye,
     Mike