lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi,

Tom Spilman wrote:
> 		for j=n,1,-1 do
> 			y[j] = y[j] + x[j]
> 		end

The execution time is dominated by the performance of the inner loop and
this in turn is dominated by the performance of gettable/settable for
numeric keys.

This code path is highly sensitive to the exact control flow, the compiler
settings and almost untractable things like cache effects.

I compiled lua-5.0.2, lua-5.1-work0 and lua-5.1-work2 with gcc 3.3 and
three different compiler settings. Additionally lvm.c was compiled
with -fno-crossjumping. Here is what I got on a Pentium III 1139 MHz
for n=10000 (time in seconds averaged over many runs, lower is better):

                -O     -O2    -O3 -fomit-frame-pointer
lua-5.0.2       3.30   3.40   3.60
lua-5.1-work0   3.72   4.30   3.44
lua-5.1-work2   4.03   4.18   3.50

Certainly confusing.

Looking at the assembler output reveals a few things:

- The inlining of luaH_getany() in lua-5.1-work2 (compared to work0)
  slows down the fast code path. Recoding this with a series of if's
  instead of a switch may help the compiler.

- The inlining of luaH_getany() increased the size and the dependencies
  for luaH_get() which in turn stops the compiler from completely
  inlining it into luaH_set(). This accounts for another slowdown.

- Even though lua_number2int uses inline assembler for gcc on x86
  the compiler generates pretty ugly code for these two lines:

    lua_number2int(k, (nvalue(key)));
    if (cast(lua_Number, k) == nvalue(key)) return luaH_getnum(t, k);

  A combined inline assembler replacement may speed up the code quite
  a bit. Anyone know what the fastest way to determine whether a double
  fits into an int is on x86?

- The code in ltable.c would benefit a lot from using likely()/unlikely()
  macros (see the Linux kernel). The compiler has a hard time guessing
  the most likely executed branches.

Bye,
     Mike