RE: Implementation of Lua and direct/context threaded code

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: RE: Implementation of Lua and direct/context threaded code
From: "Grellier, Thierry" <t-grellier@...>
Date: Mon, 29 May 2006 19:45:46 +0200

Ok I missed the thread, because searching with "direct threaded"
keywords.
Well 3% gain... I wasn't expecting much for the reasons you mentioned,
but, but I still think this is little effort to support it (cf silly
example below) so why not? 
And supporting it is NOT at the expense of portability though; I contest
that idea, because I think this is more at the expense of
maintainability. And yes I would consider using JIT when speed matters,
but not sure I will only make programs for x86...
It is true that branch misprediction matters, this is why I was quoting
the article claiming they can improve it (with context threading instead
of direct threading) though I haven't looked in depth... I guess that
opcode ordering in switch may also have an influence...

enum { SETR1, SETR2, INCR_R1, COMP_R1_R2, BRANCH_TRUE, NEG_BRANCH,
END_OF_PROGRAM };
#include <stdio.h>

#define GET_OPCODE(i) i

#ifdef DIRECT_THREADED
# define SWITCH(e) goto *dispatches[e];
# define CASE
# define BREAK     goto *dispatches[GET_OPCODE(*pc++)]
#else
# define SWITCH(e) switch(e)
# define CASE      case
# define BREAK     continue
#endif

typedef char Instruction ;

void interprete(Instruction pc[]) {
  int r0 = 0, r1 = 0, r2 = 0;

#ifdef DIRECT_THREADED
  static void* dispatches[] = { &&SETR1, &&SETR2, &&INCR_R1,
&&COMP_R1_R2, &&BRANCH_TRUE, &&NEG_BRANCH, &&END_OF_PROGRAM };
#endif

  for(;;) {
    Instruction i = *pc++;
    SWITCH(GET_OPCODE(i)) {
    CASE SETR1:
      r1 = (int) *pc++;
      printf("[pc=%xd, r1=%d, r2=%d] NSETR1\n", pc, r1, r2);
      BREAK;
    CASE SETR2:
      r2 = (int) *pc++;
      printf("[pc=%xd, r1=%d, r2=%d] NSETR2\n", pc, r1, r2);
      BREAK;
    CASE INCR_R1:
      r1++;
      printf("[pc=%xd, r1=%d, r2=%d] NINCR_R1\n", pc, r1, r2);
      BREAK;
    CASE COMP_R1_R2:
      r0 = r1 == r2;
      printf("[pc=%xd, r1=%d, r2=%d, r0=%d] NCOMP_R1_R2\n", pc, r1, r2,
r0);
      BREAK;
    CASE BRANCH_TRUE:
      if (r0) pc += *pc;
      else pc++;
      printf("[pc=%xd, r1=%d, r2=%d, r0=%d] NBRANCH_TRUE\n", pc, r1, r2,
r0);
      BREAK;
    CASE NEG_BRANCH:
      pc -= *pc;
      printf("[pc=%xd, r1=%d, r2=%d] NEG_BRANCH\n", pc, r1, r2);
      BREAK;
    CASE END_OF_PROGRAM:
     printf("[pc=%xd, r1=%d, r2=%d] END_OF_PROGRAM\n", pc, r1, r2);
     return;
    }
  }
}

int main() {
  static Instruction program[] = {
    SETR1, 1,
    SETR2, 125,
    COMP_R1_R2,
    BRANCH_TRUE, 4,
    INCR_R1,
    NEG_BRANCH, 5,
    END_OF_PROGRAM
    };
  int i;
  for (i = 0; i < 10000; i++)
    interprete(program);
  return 0;
}
-----Original Message-----
From: lua-bounces@bazar2.conectiva.com.br
[mailto:lua-bounces@bazar2.conectiva.com.br] On Behalf Of Mike Pall
Sent: Monday, May 29, 2006 6:33 PM
To: Lua list
Subject: Re: Implementation of Lua and direct/context threaded code

Hi,

Grellier, Thierry wrote:
> I was reading the article: The Implementation of Lua 5.0 and went
> through the usage of switch/case instruction dispatch preferred to
> direct threaded code techniques (bound to gcc usage) for portability
> reasons. I thought that conditional compilation was also key to
> portability more than language... I also guess that a lot of us are
> building our lua interpreter with gcc.
> 
> It is hard to fully understand how much it improves a real application
> in the end, so I was wondering if anyone has experimented with using
> these techniques instead of default lua implementation. I wished I
could
> have had time to do so, but...

http://lua-users.org/lists/lua-l/2004-09/msg00610.html

Summary: not worth it -- at least not on x86.

There's a reason: Lua uses a one-opcode + three-operand bytecode
and operates on a virtual (caller/callee-overlapping) register
file. This means the machine code implementing each opcode is
much "fatter" (compared to a stack VM) and some of the operand
decoding can be moved before the opcode dispatch. This offers
more opportunities for out-of-order scheduling and filling the
pipeline bubbles caused by the branch mispredictions (note that
the direct threaded code technique does not remove all branch
mispredictions either).

Shameless plug: if you want faster execution (at the expense of
portability) then try LuaJIT: http://luajit.luaforge.net/

Bye,
     Mike

Prev by Date: Re: D3DCREATE_FPU_PRESERVE
Next by Date: Best strategy to check all elements of a table
Previous by thread: Re: Implementation of Lua and direct/context threaded code
Next by thread: RE: Implementation of Lua and direct/context threaded code
Index(es):
- Date
- Thread