[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: RE: Implementation of Lua and direct/context threaded code
- From: "Grellier, Thierry" <t-grellier@...>
- Date: Mon, 29 May 2006 19:45:46 +0200
Ok I missed the thread, because searching with "direct threaded"
keywords.
Well 3% gain... I wasn't expecting much for the reasons you mentioned,
but, but I still think this is little effort to support it (cf silly
example below) so why not?
And supporting it is NOT at the expense of portability though; I contest
that idea, because I think this is more at the expense of
maintainability. And yes I would consider using JIT when speed matters,
but not sure I will only make programs for x86...
It is true that branch misprediction matters, this is why I was quoting
the article claiming they can improve it (with context threading instead
of direct threading) though I haven't looked in depth... I guess that
opcode ordering in switch may also have an influence...
enum { SETR1, SETR2, INCR_R1, COMP_R1_R2, BRANCH_TRUE, NEG_BRANCH,
END_OF_PROGRAM };
#include <stdio.h>
#define GET_OPCODE(i) i
#ifdef DIRECT_THREADED
# define SWITCH(e) goto *dispatches[e];
# define CASE
# define BREAK goto *dispatches[GET_OPCODE(*pc++)]
#else
# define SWITCH(e) switch(e)
# define CASE case
# define BREAK continue
#endif
typedef char Instruction ;
void interprete(Instruction pc[]) {
int r0 = 0, r1 = 0, r2 = 0;
#ifdef DIRECT_THREADED
static void* dispatches[] = { &&SETR1, &&SETR2, &&INCR_R1,
&&COMP_R1_R2, &&BRANCH_TRUE, &&NEG_BRANCH, &&END_OF_PROGRAM };
#endif
for(;;) {
Instruction i = *pc++;
SWITCH(GET_OPCODE(i)) {
CASE SETR1:
r1 = (int) *pc++;
printf("[pc=%xd, r1=%d, r2=%d] NSETR1\n", pc, r1, r2);
BREAK;
CASE SETR2:
r2 = (int) *pc++;
printf("[pc=%xd, r1=%d, r2=%d] NSETR2\n", pc, r1, r2);
BREAK;
CASE INCR_R1:
r1++;
printf("[pc=%xd, r1=%d, r2=%d] NINCR_R1\n", pc, r1, r2);
BREAK;
CASE COMP_R1_R2:
r0 = r1 == r2;
printf("[pc=%xd, r1=%d, r2=%d, r0=%d] NCOMP_R1_R2\n", pc, r1, r2,
r0);
BREAK;
CASE BRANCH_TRUE:
if (r0) pc += *pc;
else pc++;
printf("[pc=%xd, r1=%d, r2=%d, r0=%d] NBRANCH_TRUE\n", pc, r1, r2,
r0);
BREAK;
CASE NEG_BRANCH:
pc -= *pc;
printf("[pc=%xd, r1=%d, r2=%d] NEG_BRANCH\n", pc, r1, r2);
BREAK;
CASE END_OF_PROGRAM:
printf("[pc=%xd, r1=%d, r2=%d] END_OF_PROGRAM\n", pc, r1, r2);
return;
}
}
}
int main() {
static Instruction program[] = {
SETR1, 1,
SETR2, 125,
COMP_R1_R2,
BRANCH_TRUE, 4,
INCR_R1,
NEG_BRANCH, 5,
END_OF_PROGRAM
};
int i;
for (i = 0; i < 10000; i++)
interprete(program);
return 0;
}
-----Original Message-----
From: lua-bounces@bazar2.conectiva.com.br
[mailto:lua-bounces@bazar2.conectiva.com.br] On Behalf Of Mike Pall
Sent: Monday, May 29, 2006 6:33 PM
To: Lua list
Subject: Re: Implementation of Lua and direct/context threaded code
Hi,
Grellier, Thierry wrote:
> I was reading the article: The Implementation of Lua 5.0 and went
> through the usage of switch/case instruction dispatch preferred to
> direct threaded code techniques (bound to gcc usage) for portability
> reasons. I thought that conditional compilation was also key to
> portability more than language... I also guess that a lot of us are
> building our lua interpreter with gcc.
>
> It is hard to fully understand how much it improves a real application
> in the end, so I was wondering if anyone has experimented with using
> these techniques instead of default lua implementation. I wished I
could
> have had time to do so, but...
http://lua-users.org/lists/lua-l/2004-09/msg00610.html
Summary: not worth it -- at least not on x86.
There's a reason: Lua uses a one-opcode + three-operand bytecode
and operates on a virtual (caller/callee-overlapping) register
file. This means the machine code implementing each opcode is
much "fatter" (compared to a stack VM) and some of the operand
decoding can be moved before the opcode dispatch. This offers
more opportunities for out-of-order scheduling and filling the
pipeline bubbles caused by the branch mispredictions (note that
the direct threaded code technique does not remove all branch
mispredictions either).
Shameless plug: if you want faster execution (at the expense of
portability) then try LuaJIT: http://luajit.luaforge.net/
Bye,
Mike