lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Vyacheslav Egorov wrote:
> Yes, I see now that there is no problem with such caching. Corner
> cases can be detected either by static analysis and/or by runtime
> checks and deoptimization. One can even handle coroutine.yield by
> flushing cached globals.

Yielding from coroutines and caching of globals are orthogonal
aspects. A coroutine switch changes the current thread pointer and
with it the current Lua stack, that's it. This implicitly changes
the references to the current function and its environment (i.e.
globals), too.

The trace recorder can easily continue inlining across coroutine
boundaries. But it's currently disabled, since I'd need to manage
multiple stacks in parallel. I may add this later, though.

> By the way, is LuaJIT 2.0 capable of outputing correct error messages
> for errors occured in optimized code (e.g. "attempt to index global
> 'x' a nil value")? At first glance it seems that information required
> for this is the same as information required for deoptimization or
> "cached globals flushing".

Yes, this works fine. Every guard has all information necessary to
seamlessly return to the interpreter. This is called an exit map
(stack slot -> IR ref) and allows reconstructing the Lua stack at
this point in the control flow. The failed type check in your
example would exit to the interpreter which would then redo the
type check and raise the same error you'd get from plain Lua (with
all details, like variable and type names).

> Also, Mike, could you please point me to some info about: typed SSA
> and store sinking/load hoisting?

A typed SSA IR just means that all instructions carry a type with
them (a 4 bit field). So e.g. the same EQ opcode can be applied to
numbers, strings, tables etc. (which generates very different code
in the backend). One could also split them up into specialized
implicitly typed operations. But this would be less orthogonal and
just needs more bits for the opcode instead.

Loop-invariant code hoisting is quite simple in a linear SSA
representation: an instruction is invariant if all of its operands
are invariant. Constants are invariant by definition. Since all
operands are defined before they are used, this can be done in a
single forward pass. This is basically a two-liner to propagate
invariant bits.

The two notable execptions are loads and stores. A load is only
invariant if the above condition holds _and_ there are no
conflicting stores in the same loop.

E.g. a load from t[i] is never invariant (because i is variant).
t[5] is invariant if t is invariant, except if there's a
conflicting store like t[i] = 1 (unless you can prove i ~= 5). But
note that neither t[3] = 1 (different constant) nor t.foo = 1
(different kind of store) would conflict.

This line of reasoning is called alias analysis and it employs a
load/store disambiguation algorithm. A basic algorithm for Lua is:
different kinds of loads/stores never alias (hash vs. array), keys
with different constants or types never alias, all others may
alias. An improvement is to deepen the analysis (e.g. t[i] and
t[i+1] do not conflict) and to dynamically add disambiguation
guards (e.g. t1[i] does not alias t2[i] if t1 ~= t2 holds).

The simplest rule for stores is that they are always variant. But
this leads to rather inefficient code. An improvement is to sink
them into side exits unless there's a conflicting load in the same
loop (the same alias analysis applies here, too).

Sinking means the store itself is only executed in the side exit
branch, but the value stays a plain reference in the main branch
(i.e. can be held in a machine register). This only works if
stores are forwarded to loads even across loop iterations or there
usually will be a conflicting load. Scalar stores can trivially be
sunk since different scalars have different stack slot numbers and
can never alias each other (and there is no way to index the stack
with a variable key).

The linear IR for loops is then encoded twice by the backend: once
with all instructions (a kind of pre-roll) and once with only the
variant instructions for the loop body plus a jump back to the
middle.

Side exits are lazily encoded. The first few times they are taken,
the Lua stack is rebuilt based on the exit map (in C code). This
is slow, but it doesn't matter unless the exit is taken very
often. In this case the recorder starts to extend the trace with a
new branch and recompiles the whole trace tree on success (this
handles outer loops, too). The exit map is only ever converted to
explicit stores plus a fast exit to the interpreter if recording
repeatedly fails for this exit (e.g. an error is raised).

> I heard about Michael Franz's publications on typed SSA, but now
> cannot find any of them online.

His publication list is at: http://www.ics.uci.edu/~franz/

A related read (does store sinking, too):
  The Multiflow Trace Scheduling Compiler
  P. Geoffrey Lowney et al, 1992
  http://citeseer.ist.psu.edu/lowney92multiflow.html

> The same for store sinking/load
> hoisting, I at first thought that these optimizations are classical
> ones, but a quick look through indexes of Muchnick's and Allen's books
> revealed nothing.

Hoisting and sinking are just some special cases of "code motion".
Muchnick has section 13.2 "Loop-Invariant Code Motion".

--Mike