[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: CSE chains in LuaJIT 2.0
- From: Nicholas Nethercote <n.nethercote@...>
- Date: Sun, 2 May 2010 22:06:00 -0700
Hi,
I read http://article.gmane.org/gmane.comp.lang.lua.general/58908 and was
very interested to read this:
- Skip-list chains: The IR is threaded with segregated, per-opcode
skip-list chains. The links are stored in a multi-purpose 16 bit
field in the instruction. This facilitates low-overhead lookup
for CSE, DSE and alias analysis. Back-linking enables short-cut
searches (average overhead is less than 1 lookup). Incremental
build-up is trivial. No hashes, no sets, no complex updates.
I looked at the LuaJIT-2.0.0-beta4 code, this (from lj_opt_fold.c) seems to
be the relevant function:
TRef LJ_FASTCALL lj_opt_cse(jit_State *J)
{
/* Avoid narrow to wide store-to-load forwarding stall */
IRRef2 op12 = (IRRef2)fins->op1 + ((IRRef2)fins->op2 << 16);
IROp op = fins->o;
if (LJ_LIKELY(J->flags & JIT_F_OPT_CSE)) {
/* Limited search for same operands in per-opcode chain. */
IRRef ref = J->chain[op];
IRRef lim = fins->op1;
if (fins->op2 > lim) lim = fins->op2; /* Relies on lit < REF_BIAS. */
while (ref > lim) {
if (IR(ref)->op12 == op12)
return TREF(ref, irt_t(IR(ref)->t)); /* Common
subexpression found. */
ref = IR(ref)->prev;
}
}
/* Otherwise emit IR (inlined for speed). */
{
IRRef ref = lj_ir_nextins(J);
IRIns *ir = IR(ref);
ir->prev = J->chain[op];
ir->op12 = op12;
J->chain[op] = (IRRef1)ref;
ir->o = fins->o;
J->guardemit.irt |= fins->t.irt;
return TREF(ref, irt_t((ir->t = fins->t)));
}
}
I have several questions about all this...
- The quoted paragraph above mentions "skip-list chains" but AFAICT the code
is just using a normal list, not a skip-list
(http://en.wikipedia.org/wiki/Skip_list). Have I misunderstood something?
- I don't understand the "while (ref > lim)" condition, it looks like the
list search is terminated early in some cases but I don't understand what
those cases are... the IR is very dense and pithy code like that is hard
for newcomers to understand.
- I tried implementing a simple list in Nanojit (used by Mozilla and Adobe),
which currently uses a hash table for CSE. I tried it only on 32-bit
immediates, and the performance was awful -- there can be dozens of
different ones in a single block so searching through them linearly was
much worse than doing a hashtable lookup, which usually hits in about 1.5
lookups (albeit using a not-cheap hash function). But the quoted
paragraph above says "average overhead is less than 1 lookup". What am I
missing?
Any extra info about this would be much appreciated! Thanks.
Nick