Re: Implementation of Lua and direct/context threaded code

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Implementation of Lua and direct/context threaded code
From: Mike Pall <mikelu-0606@...>
Date: Thu, 1 Jun 2006 03:10:35 +0200

Hi,

Grellier, Thierry wrote:
> I've attached a patch, that can allow using some kind of direct threaded
> technique to lua 5.1 (on lvm.c) for anything but i386 with gcc compiler
> (selection is done in luaconf.h and can be improved I guess, notably I
> made assumption for powerpc... but at least it shall preserve
> portability). On i386 it keeps switch/case.

Umm, this doesn't work. The "const Instruction i = ..." appears
in every block as a new local variable. Ok, easy fix: remove the
"const" from the main declaration and the "const Instruction"
redeclaration in every block.

> A quick test let me think it allows to gain up to 5%-10% on
> sparcs using some benches of
> http://shootout.alioth.debian.org/, but performs worst on i386
> (because of replicated code in BREAK I guess, and less
> registers). So if anybody wants to play with it and see how it
> performs on their system...

Well, I've looked into the x86 assembler output: GCC is smart
enough to recognize the identical code sequences and merges all
of them into a single instance. This means you still get a single
indirect branch with no advantages for branch prediction.

So you absolutely must compile lvm.c (and only this file) with
-fno-crossjumping or you won't see any effect, no matter how hard
you try.

Still ... the generated code is not faster and has gotten huge
(I-cache bloat). It's better to change the hook check to:

  if (L->hookmask & (LUA_MASKLINE | LUA_MASKCOUNT)) goto activehook; 

... and add the code for the uncommon execution path at the end
(might be beneficial for plain Lua, too).

The register allocator seems to have a hard time with the main
loop on a register-starved x86. Even with -fomit-frame-pointer.
And it makes some unfortunate decisions, too (like spilling ra
before the branch). Moving the StkId ra = RA(i) into every block
helps a bit. Anyway, the generated code is still messy and spills
far too many registers.

I benchmarked this on a PIII and a P4 with GCC 3.3/3.4 and -O3
-fomit-frame-pointer (plus -fno-crossjumping for lvm.c). The
numbers given are the speedup (+) or reduction (-) in percent
against stock Lua 5.1 (compiled with the same options):

Benchmark      PIII   P4
-------------------------
binarytrees    +10    -6    
cheapconcw     +11   +12    
fannkuch         0    +5    
knucleotide     +4    -3    
mandelbrot      +7    -2    
nbody           +6    -6    
nsieve          +7   -24    
nsievebits     +12    -5    
pidigits       +17    +1    
recursive      +15   +11    
regexdna        +2    -5    
revcomp         +7    -8    
spectralnorm    +3   -12    
sumfile         +6   -11    

(All other benchmarks are +-0 because the bottleneck is elsewhere.)

Not really convincing. Especially on the P4, which has deeper
pipelines and should benefit a lot more from the fewer branch
mispredictions. But I guess its I-cache suffers badly from the
code explosion. Well ... it was worth a try.

> Regarding LuaJIT, and referring to article I mentioned (more
> details here:
> http://www.cs.toronto.edu/~bv/tcl2005/tcl2005-slides.pdf). Does
> LuaJIT use similar techniques to reduce misprediction and/or
> inline code in branches?

LuaJIT compiles to machine code, i.e. inlining all opcodes. So it
has zero dispatch overhead by definition. And it's doing a lot
more optimizations, too (like specialization or inlining library
functions). You can have a look at the assembler output with:
  luajit -O -j dump somefile.lua

Just in case you want to compare the above numbers with LuaJIT:
  http://luajit.luaforge.net/luajit_performance.html
E.g. LuaJIT is 6.74 times faster for mandelbrot and the above
gets you around 1.07 or 0.98 compared to plain Lua (which is the
reference with 1.00).

Bye,
     Mike

Prev by Date: Re: Lua Digest, Vol 162, Issue 3
Next by Date: Re: require "compat-5.1" problem
Previous by thread: Re: Lua Digest, Vol 162, Issue 3
Next by thread: Re: require "compat-5.1" problem
Index(es):
- Date
- Thread