[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: LuaJIT IR Design
- From: chevalma@...
- Date: Fri, 13 May 2011 14:19:12 -0400 (EDT)
Thanks for your answers.
Since we're building a method-based JIT, this solution isn't exactly what
we need, but I think it may be possible to adapt some ideas from it. I
have been thinking that I could have an array of instructions in which
basic blocks are allocated with some reserve space for new instructions,
and either simply update a list of free regions when blocks are deleted,
or call some kind of compaction algorithm when the instruction list for a
method grows too large.
The main concerns I have are efficiently keeping a list of destinations
for instructions, which I think could possibly be done in a chained
(linked list) manner, and keeping a list of successors/predecessors/edges
for blocks, which perhaps could be handled similarly.
> Maxime Chevalier-Boisvert wrote:
>> 1. The post mentions using bidirectionally extensible arrays to
>> store instructions. Is one such array used per basic block? If not,
>> how are basic blocks efficiently allocated/moved within the array?
> It's a trace compiler -- it doesn't work at the basic block level.
> One might consider a single trace to be similar to an extended
> basic block (one entry, multiple exits). But a root trace may also
> hold a loop, so that's where the analogy breaks.
>> What's the advantage of being able to extend the array
> Constants grow downwards and don't interfere with regular
> instructions. This avoids some difficulties esp. for loop
> optimizations, which may need to emit new constants, but want
> their loads to be hoisted.
> Having the lowest numbered IR references also implicitly gives
> them a lower weight in the register allocation heuristics. Thus
> rematerialization is preferred over spilling.
> And the killer feature is to be able to test for a constant simply
> by looking at the reference. IR references are biased relative to
> the cut off point between the two parts of the IR. Anything below
> the bias value (< 32768) must be a constant. This test is heavily
> used and it's very cheap.
> The IR array is lazily grown and reused for subsequent traces. In
> practice very few moves or array extensions are needed. Once a
> trace has been assembled, it's copied/compacted to a newly
> allocated memory area.
>> My optimizer does alot of IR transforms which
>> merge/eliminate basic blocks, remove/replace/insert new
>> instructions. How is that done efficiently in LuaJIT?
> The former does not apply to a trace compiler. The latter can be
> avoided by doing forward transforms on-the-fly during trace
> recording and backward transforms on-the-fly during assembly.
>> 2. In my current layout I also keep a list of uses for each IR
>> instruction (all instructions reading the output value of a given
>> instruction). This is very useful when I want to replace the value
>> of an instruction by that of another instruction. What kind of trick
>> does LuaJIT use to do this efficiently?
> Folding happens before an instruction is actually emitted. But it
> may cause new instructions to be emitted and old instructions to
> fall unused, too. Unused instructions are simply ignored by the
> backward pass of the assembler. This gives you DCE for free.
> Backward transforms already start from a use and don't need a list
> of other uses.
>> 3. How are constants encoded in the LuaJIT IR? I saw it uses 16 bit
>> references to refer (I assume) to indices of other IR instructions
>> elsewhere in the instruction array. Are constants also stored in the
>> array, as instructions?
> Constants are stored like any other IR instructions. They just
> grow downwards in the IR.
> Note: the IR dump (-jdump=i) hides all of these details and merges
> constants into the output of the regular IR instructions.
>> 4. I saw the IR instructions have up to 2 operands. What about call
> A chain of call arguments (CARG) is used for extension. The IR
> dump hides this detail. It's much more important to have an
> orthogonal and linearly indexable IR (always exactly 64 bits per
> instruction) than to optimize for special cases.