lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I've attached a patch, that can allow using some kind of direct threaded
technique to lua 5.1 (on lvm.c) for anything but i386 with gcc compiler
(selection is done in luaconf.h and can be improved I guess, notably I
made assumption for powerpc... but at least it shall preserve
portability). On i386 it keeps switch/case.

A quick test let me think it allows to gain up to 5%-10% on sparcs using
some benches of http://shootout.alioth.debian.org/, but performs worst
on i386 (because of replicated code in BREAK I guess, and less
registers).
So if anybody wants to play with it and see how it performs on their
system...

Regarding LuaJIT, and referring to article I mentioned (more details
here:
http://www.cs.toronto.edu/~bv/tcl2005/tcl2005-slides.pdf). Does LuaJIT
use similar techniques to reduce misprediction and/or inline code in
branches?

-----Original Message-----
From: lua-bounces@bazar2.conectiva.com.br
[mailto:lua-bounces@bazar2.conectiva.com.br] On Behalf Of Mike Pall
Sent: Monday, May 29, 2006 6:33 PM
To: Lua list
Subject: Re: Implementation of Lua and direct/context threaded code

Hi,

Grellier, Thierry wrote:
> I was reading the article: The Implementation of Lua 5.0 and went
> through the usage of switch/case instruction dispatch preferred to
> direct threaded code techniques (bound to gcc usage) for portability
> reasons. I thought that conditional compilation was also key to
> portability more than language... I also guess that a lot of us are
> building our lua interpreter with gcc.
> 
> It is hard to fully understand how much it improves a real application
> in the end, so I was wondering if anyone has experimented with using
> these techniques instead of default lua implementation. I wished I
could
> have had time to do so, but...

http://lua-users.org/lists/lua-l/2004-09/msg00610.html

Summary: not worth it -- at least not on x86.

There's a reason: Lua uses a one-opcode + three-operand bytecode
and operates on a virtual (caller/callee-overlapping) register
file. This means the machine code implementing each opcode is
much "fatter" (compared to a stack VM) and some of the operand
decoding can be moved before the opcode dispatch. This offers
more opportunities for out-of-order scheduling and filling the
pipeline bubbles caused by the branch mispredictions (note that
the direct threaded code technique does not remove all branch
mispredictions either).

Shameless plug: if you want faster execution (at the expense of
portability) then try LuaJIT: http://luajit.luaforge.net/

Bye,
     Mike

Attachment: directthreaded.patch
Description: directthreaded.patch