lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


This mail explains the general idea we used to implement lexical scoping
in Lua. Sorry for the delay in answering several questions about this.

The main problem for implementing lexical scoping in Lua is that, when we
first see a variable, we do not know whether it will be used by an inner
function, and therefore whether it may have to outlive the function that
created it (as I do not know a better name for them, I will keep calling a
local variable that is used by an inner function an "upvalue"). Instead of
trying to solve this directly (for instance by changing previous code when
a variable is used by inner functions), we took a more dynamic approach:
all local variables live in the stack, as long as its block is active. When
a block that declares an upvalue finishes, then (and only then) the upvalue
is moved from the stack to the heap (that is, to a dynamically allocated
struct that holds its value). At compile time, whenever a block ends, we
know for sure whether it declares upvalues, and then we can generate a
specific opcode, CLOSE, to move those upvalues from the stack to the heap.

Closures now keep pointer to TObjects, instead of TObjects themselves.
Those pointers may point to the stack, if the upvalue is still active, or
to the heap. Those closures are "flat": when we create a closure, all its
upvalues are either on the stack (that is, they are declared in the
immediate enclosing function), or in the closure of the enclosing function
(those may point to lower levels in the stack or to the heap). Any closure
that points to the stack we call an "open" closure, and we keep a list of
all current open closures. Whenever we execute a CLOSE instruction, we
traverse this list correcting any closure that points to those upvalues
which are moving from the stack to the heap. To avoid traversing the whole
list, the list is kept sorted by stack level, so that as soon as we find
a closure that do not need correction we can stop the traversal.
(We also need to traverse this list when we return from a function and
when we handle errors [return from multiple functions].)

What is the cost of all that? Upvalues in their original level are always
in the stack, and are handled like any regular local variable. Upvalues in
inner levels are accessed through opcodes GETUPVAL and SETUPVAL, which
involves only an array indirection:

      case OP_GETUPVAL: {
        int b = GETARG_B(i);
        setobj(ra, cl->upvals[b].val);
        break;
      }

A CLOSE opcode is only issued when there are upvalues to be closed. For
each closure traversed, at least one correction is made. So, the cost is
proportional to the number of upvalues used by a closure and the number of
closures created by the program (this cost can then be credited as a small
increase in the cost of creating closures, a not-so-cheap operation).

The last cost is that, when creating a new closure with open references
(the most common case), we must insert it in its proper place in the list
of open closures. Usually the proper place is right in front of the list,
so the cost is low. But we can build some artificial examples that may make
the cost of creating a new closure proportional to the number of open
closures already created. (For this to be a problem it must involve two
different closures being generated in loops, that is, in bulk quantities).

-- Roberto