[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: 5.1 a little slower than 5.0.2
- From: Mike Pall <mikelu-0409@...>
- Date: Sat, 25 Sep 2004 04:49:55 +0200
Hi,
Tom Spilman wrote:
> for j=n,1,-1 do
> y[j] = y[j] + x[j]
> end
The execution time is dominated by the performance of the inner loop and
this in turn is dominated by the performance of gettable/settable for
numeric keys.
This code path is highly sensitive to the exact control flow, the compiler
settings and almost untractable things like cache effects.
I compiled lua-5.0.2, lua-5.1-work0 and lua-5.1-work2 with gcc 3.3 and
three different compiler settings. Additionally lvm.c was compiled
with -fno-crossjumping. Here is what I got on a Pentium III 1139 MHz
for n=10000 (time in seconds averaged over many runs, lower is better):
-O -O2 -O3 -fomit-frame-pointer
lua-5.0.2 3.30 3.40 3.60
lua-5.1-work0 3.72 4.30 3.44
lua-5.1-work2 4.03 4.18 3.50
Certainly confusing.
Looking at the assembler output reveals a few things:
- The inlining of luaH_getany() in lua-5.1-work2 (compared to work0)
slows down the fast code path. Recoding this with a series of if's
instead of a switch may help the compiler.
- The inlining of luaH_getany() increased the size and the dependencies
for luaH_get() which in turn stops the compiler from completely
inlining it into luaH_set(). This accounts for another slowdown.
- Even though lua_number2int uses inline assembler for gcc on x86
the compiler generates pretty ugly code for these two lines:
lua_number2int(k, (nvalue(key)));
if (cast(lua_Number, k) == nvalue(key)) return luaH_getnum(t, k);
A combined inline assembler replacement may speed up the code quite
a bit. Anyone know what the fastest way to determine whether a double
fits into an int is on x86?
- The code in ltable.c would benefit a lot from using likely()/unlikely()
macros (see the Linux kernel). The compiler has a hard time guessing
the most likely executed branches.
Bye,
Mike