lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Roberto Ierusalimschy escribió:

> This mail explains the general idea we used to implement lexical scoping
> in Lua. Sorry for the delay in answering several questions about this.

No, don't apologise. Thanks for explaining. It's very useful.

(Note: I use the terminology "binding" in the following rather than
"upvalue". The explanation is a little later in the discourse.)

The idea seems very sensible, and is relatively efficient, as far as I can
see.

I remain with one question, and one suggestion. The question is probably
easier: will C closures be presented with the same interface? That is, will
C closures (need to) use some API call to indirect through to the value of
an enclosed binding, and therefore have the ability to rebind, or will they
be presented with "unbound" (resolved) values rather than bindings? There
would seem to be arguments both ways; changing the C interface will render
a lot of code obsolete; on the other hand, allowing C closures to make use
of bindings provides a certain amount of power: in particular, it makes
possible the implementation of user-objects which reference
garbage-collected Lua objects (that is, by implementing user-objects as
collections of bindings, or in other words closures, rather than or in
addition to user-objects as malloc'd RAM).

For example, such an implementation might provide an API like
lua_getbinding(L, i) / lua_setbinding(L, i) which push / pop the stack from
/ to the indexed binding. Moderate compatibility with old code could be
provided with the API lua_pushbindings(L) which pushes (the values of) all
enclosed bindings onto the stack, effectively emulating the current
implementation.

To create a binding in the first place, one would have an API like
lua_newbinding(L) which perhaps popped the stack into a newly created
binding and then pushed the binding (i.e. the reference to the value) back
onto the stack; as I understand it, this is essentially what the CLOSE VM
operation would do. Whether there would be anything which could be done
with this object other than eventually enclose it into one or more closures
is worth thinking about.

It is interesting to observe that, with or without the above C interface,
what is being created is, in effect, fixed-length mutable vectors (and that
the existing implementation is, in effect, fixed-length immutable vectors).
I have speculated (to myself) several times about whether there is an
interesting language feature hidden in this fact.

While a very clever implementation, it is certainly going to be
significantly slower than the current implementation, particularly as it
requires (as far as I can see) a memory allocation for each enclosed
binding ("upvalue"), in addition to the memory allocation for the closure.
In a typical case of a function with a single enclosed binding, this
doubles the memory allocations (and correspondingly the time for garbage
collection.) As you say, this can be seen as an increment to the cost of
creating a closure, which is not cheap, but it is not (or at least should
not be) significantly slower than creating a table. I draw the analogy
because tables are a plausible alternative to lexically-scoped bindings.

I would say that an upvalue is different from a binding by virtue of being
immutable; in that sense, it is truly a value whereas a binding is (a)
variable. I accept that both are useful, but I would venture to say (at the
risk of being shot down by fans of mutability) that upvalues are more
frequent.

Be that as it may, I wonder if it would be possible to achieve the best of
both worlds without complicating the design of the language too much; that
is, whether it would be possible to have both bindings and upvalues. The
question becomes, what is an upvalue in a world of bindings? There are two
possible answers: 1) it is an immutable binding; 2) it is a binding with
only one referrer. In the second case, the syntax of the language could (or
not) enforce immutability. The suggestion "on the table" is the second
alternative; that is, if I want a traditional upvalue, I enclose it's
creation in a block in which a unique local variable is assigned the value
and then enclosed (and thus continues to be mutable).

On the other hand, the first alternative could be implemented, at worst at
the cost of checking an indirection flag. In fact, the indirection flag
only needs to be checked on set (and then only to enforce immutability). In
such an implementation, both bindings and upvalues would be created with
indirection pointers, but in the latter case the indirection pointer would
be an internal pointer within the same memory block, which could include
more than one upvalue. This increases the size of allocation (from the
status quo) but limits the number of allocations to the number of actual
mutable bindings. Maintaining the existing % syntax for such upvalues (in
effect, "anonymous immutable bindings") would be one possibility, although
personally (and I'm certain to be outvoted on this) I would find it more
useful to syntactically mark mutable bindings because their effects need to
be examined more closely when reviewing code.

I fear that the foregoing was disordered and prolix, and apologise if that
is the case. It's a bit late, but I wanted to get some of my thoughts down
for discussion, if anyone is so inclined.

Rici