[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: CSE chains in LuaJIT 2.0
- From: Nicholas Nethercote <n.nethercote@...>
- Date: Wed, 5 May 2010 22:57:15 -0700
On Mon, May 3, 2010 at 5:26 AM, Mike Pall <email@example.com> wrote:
> The IR is in strict SSA form. Any reference operand of an
> instruction must point to an instruction which strictly precedes
> it in the instruction stream. So if you're searching for FOO(a, b)
> in the chain for FOO you can stop when you reach lim = max(a, b).
> An instruction at or before lim cannot possibly contain references
> to either a or b.
Ah, nice. We can't take advantage of this in Nanojit right now
because intermediate code is allocated in chunks and there's no
guarantee that chunks are allocated in ascending order. Maybe I
should look at adding that guarantee.
> Also, LuaJIT's IR is not a low-level IR. There's much less need
> for having field offsets etc. as constants. I have rarely seen a
> trace with more than a handful of integer constants.
> Most field offsets etc. are encoded as implicit or explicit
> literals in the instructions themselves. E.g. SLOAD #1 is a load
> from stack slot number 1 and the 1 is a literal, encoded in one of
> the operands.
Nanojit allows offsets to be encoded directly in loads/stores, but
there are still a fair number of integer immediates. A lot of these
are small, eg. used for shifting/masking/etc for boxing and unboxing.
And pointer immediates are used often too, for checking if objects are
of particular classes, eg. is this an array?
> As you can see, all constants are close together. LuaJIT's IR is
> very compact (64 bit per instruction). Doing a lookup in the KINT
> chain is more or less a linear lookup in the same cache line, i.e.
> basically free.
So the integer chain lookup is O(n), where n is the number of
immediates in the block, but (a) you tend not to have that many
immediates, and (b) the IR is dense enough that not much memory is
touched as part of the search?
Thanks for the detailed answer.