lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi,

I read http://article.gmane.org/gmane.comp.lang.lua.general/58908 and was
very interested to read this:

- Skip-list chains: The IR is threaded with segregated, per-opcode
  skip-list chains. The links are stored in a multi-purpose 16 bit
  field in the instruction. This facilitates low-overhead lookup
  for CSE, DSE and alias analysis. Back-linking enables short-cut
  searches (average overhead is less than 1 lookup). Incremental
  build-up is trivial. No hashes, no sets, no complex updates.

I looked at the LuaJIT-2.0.0-beta4 code, this (from lj_opt_fold.c) seems to
be the relevant function:

  TRef LJ_FASTCALL lj_opt_cse(jit_State *J)
  {
    /* Avoid narrow to wide store-to-load forwarding stall */
    IRRef2 op12 = (IRRef2)fins->op1 + ((IRRef2)fins->op2 << 16);
    IROp op = fins->o;
    if (LJ_LIKELY(J->flags & JIT_F_OPT_CSE)) {
      /* Limited search for same operands in per-opcode chain. */
      IRRef ref = J->chain[op];
      IRRef lim = fins->op1;
      if (fins->op2 > lim) lim = fins->op2;  /* Relies on lit < REF_BIAS. */
      while (ref > lim) {
        if (IR(ref)->op12 == op12)
          return TREF(ref, irt_t(IR(ref)->t));  /* Common
subexpression found. */
        ref = IR(ref)->prev;
      }
    }
    /* Otherwise emit IR (inlined for speed). */
    {
      IRRef ref = lj_ir_nextins(J);
      IRIns *ir = IR(ref);
      ir->prev = J->chain[op];
      ir->op12 = op12;
      J->chain[op] = (IRRef1)ref;
      ir->o = fins->o;
      J->guardemit.irt |= fins->t.irt;
      return TREF(ref, irt_t((ir->t = fins->t)));
    }
  }

I have several questions about all this...

- The quoted paragraph above mentions "skip-list chains" but AFAICT the code
  is just using a normal list, not a skip-list
  (http://en.wikipedia.org/wiki/Skip_list).  Have I misunderstood something?

- I don't understand the "while (ref > lim)" condition, it looks like the
  list search is terminated early in some cases but I don't understand what
  those cases are... the IR is very dense and pithy code like that is hard
  for newcomers to understand.

- I tried implementing a simple list in Nanojit (used by Mozilla and Adobe),
  which currently uses a hash table for CSE.  I tried it only on 32-bit
  immediates, and the performance was awful -- there can be dozens of
  different ones in a single block so searching through them linearly was
  much worse than doing a hashtable lookup, which usually hits in about 1.5
  lookups (albeit using a not-cheap hash function).  But the quoted
  paragraph above says "average overhead is less than 1 lookup".  What am I
  missing?

Any extra info about this would be much appreciated!  Thanks.

Nick