lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


> The indirect, table-based branch to the opcode is the real
> problem. It's highly unpredictable since a switch () usually only
> adds one dispatch point.
> 
> Using (indirect) threaded dispatch (e.g. with the labeled goto
> extension of GCC or in assembler) replicates the dispatch. This
> makes it much less likely to get branch prediction misses.

The VM code now uses macros for the cases in the switch, so it
is not difficult to make it use the labeled goto extension of
GCC. Unfortunately, GCC "optimizations" killed any possible gain. Each
case ends with the same code, which jumps to the next instruction. Even
with -O0, GCC optimized all those repeated fragments to use a single
copy of a "jmp *%eax" instruction...

The "trick" file looks like this, and must be inserted inside
the luaV_execute function:

/*==================================================================*/
#undef vmdispatch
#undef vmcase
#define vmdispatch(x)     goto *disptab[x];

#define vmcase(c,b)     L_##c: {b}; \
    i = *(ci->u.l.savedpc++); \
    if ((L->hookmask & (LUA_MASKLINE | LUA_MASKCOUNT)) && \
        (--L->hookcount == 0 || L->hookmask & LUA_MASKLINE)) { \
      Protect(traceexec(L)); \
    } \
    ra = RA(i); \
    vmdispatch(GET_OPCODE(i));


static void *disptab[] = {
  &&L_OP_MOVE,
  &&L_OP_LOADK,
  &&L_OP_LOADBOOL,
  ...
};
/*==================================================================*/

-- Roberto