[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Lua 5.0 and globals (long)
- From: "Wim Couwenberg" <w.couwenberg@...>
- Date: Tue, 25 Feb 2003 14:33:08 +0100
Hi,
during my experiments with Lua 5.0 I compiled a list of issues that I ran
into. Almost all of them concerned the new globals system. Following the
mailing list lately I guess I'm not the only one that has some difficulties
to grasp this system in full. Therefore I edited my personal notes for
publication on the list.
Note: although it does contain a list of suggestions, the note below should
not be considered as a wish list but first and foremost just as food for
thought. In fact, your thoughts are as good as mine and I would like to
hear about them! Could we not do more with less? The simple Lua 4 approach
got the job done for us thusfar (sandboxing in a sort of CGI-like system.)
Well, anyway, here goes...
Wim
--8<--snip--snip--8<--snip--snip--8<--
Some practical observations about tables of globals in Lua 5.0
Wim Couwenberg, February 25, 2003
Index:
1) Summary
2) Traps
3) Suggestions
1) Summary
This note lists some personal experiences with the globals system in Lua
5.0. Part 2) sums up some traps that I encountered (and fell into!) Based
on these, part 3) makes a number of suggestions for an adjusted globals
scheme in Lua. In short these are:
- Closures keep an environment table.
- Introduce a table of globals per thread.
- Each new closure binds to the _globals_ of the thread as its environment
instead of to the environment of the instantiating closure.
- Abandon the get/setglobals calls for closures and stack levels.
- A "specified block" statement might be useful to temporarily push a new
table of globals for a thread.
2) A number of recurring traps
OVERALL EXPERIENCE: The current globals system in Lua 5.0 is
counterintuitive and is likely to bite you when you least expect it, _even_
if you are aware of its pitfalls.
1. FACTORIZATION PROBLEM. The setglobals call can change the globals of a
closure. In practice this bit me more than once, even when I was prepared.
The scenario: a complex function is split into different parts (factored)
but in the end setglobals is only applied to the main closure. This is a
trap because factorization can take place some time after the first
implementation.
2. STEALING GLOBALS. With getglobals you can get at the globals of a
closure. This global space can contain much more than you might want to
reveal (i.e. in a sandbox.) You will be caught off guard sooner or later.
Setting the __globals field to protect or masquerade globals is a non-local
operation. It can influence other, unkown, closures that share the same
globals in unpredictable and unexpected ways. (But see (*) below.)
3. INDIRECT STEALING. The factorization problem reappears in the form of
"indirect stealing." A function can return other functions, that might not
be properly protected against stealing globals.
4. STACK INTROSPECTION. A getglobals call can reveal the globals table at a
given stack level. This is an ever bigger threat than the "stealing
globals" problem. To protect yourself from stack introspection, you must
eliminate (or alter) the getglobals function or use the __globals guard _on
each_ stack level at _all times_.
5. SHIFTING LEVELS. Factoring a function into several helper functions will
shift the stack level. For example, a getglobals(n) call should probably
become getglobals(n+1) in a helper function if n>1.
6. MISSING LEVELS. The globals table of a function call that is bytecoded
as a tail call will not be present in the stack. In particular stack levels
do not necessarily match function call nesting levels. There is no way to
spot this inconsistency (yet.) The result of a getglobals(n) call is
inherently unpredictable.
7. CALL LEVEL GLOBALS. Many functions will need to revert to the globals
table of the calling closure. The loadfile and require calls are evidence
of this. But 1) "loadfile" depends on stack levels (trap 6) and 2)
"require" falls into trap 2. Because require is a C function, its globals
cannot be changed. It is practically impossible to consistently get the
caller's globals because of trap 6.
8. IMPLICIT SWITCHING. That a function uses its own globals internally is
none of the caller's concern, but that a function is likely to _return_ or
_produce_ closures with different globals than those locally in effect is
almost always confusing. Moreover, you can not always detect the closure's
globals -because they can be protected or masqueraded- nor can you always
change them.
(*) FUNCTION PROXY. Instead of setting the __globals field, you can protect
a given function's globals table by wrapping the function in a "function
proxy" as follows:
-- Wrap func in a function proxy
function func_proxy(func)
local function proxy(...)
return func(unpack(arg))
end
setglobals(proxy, {__globals = false})
return proxy
end
-- Example: protect the import function
import = func_proxy(import)
3) Suggestions
The idea that each function has its own table of globals (or an "environment
table" as it is now called) is not a bad one, but its use should be less
confusing (as well as potentially dangerous.) I like to think of an
environment table as being just a special upvalue in a closure.
1. Like any other upvalue, the environment should be private to the closure,
so getglobals and setglobals with a closure argument should be abandoned.
2. A closure cannot change its environment table (it "acts as" an upvalue.)
The environment can be obtained from within the closure by an
"environment()" call. (Or should it be an implicitly named upvalue __env?
In any case, I want to reserve the get/setglobals calls for the thread's
globals, see 7. below.)
3. The environment of a closure should be (implicitly) bound when the
closure is instantiated (and only then.)
4. Each thread (coroutine) defines a table of globals that is uniquely
identifiable at any time during the thread's execution. I will refer to
this table as "the globals" of that thread.
5. Currently the environment of a new closure is bound to the environment of
the closure that instantiates it. This is often not what is intended.
Instead, a new closure's environment should be bound to the globals of the
thread.
6. In particular, any two closures that are instantiated either explicitly
(by a function statement) or implicitly (by some function call) within the
same thread will bind to the same globals unless the globals are changed
explicitly between instantiations.
7. The globals can _only_ be changed explicitly so IMPLICIT SWITCHING (trap
8) will be avoided. Simple get/setglobals calls can be used to switch a
thread's globals (but see also (**) below.)
8. With this new binding rule, the main (and only sensible?) use for
getglobals(<number>) will become obsolete, namely a getglobals(2) as used by
loadstring and loadfile for example. Both the getglobals(<number>) and
setglobals(<number>,<table>) calls should be abandoned (for stack level 1 it
is replaced by an "environment" call as in 2. above.)
9. The old binding rule for closures can be emulated by temporarily
switching the globals to the environment(), or in terms of remark (**)
below:
in environment() do ... end
(**) PUSHING GLOBALS. One of the main uses for switching globals is to
temporarily "push" a new table of globals, do some processing and then "pop"
it to restore the old globals. It could be useful to have an explicit
statement to support this. A "specified block" for example:
in <new-globals-table> do
...
end
The new globals table (i.e. an expression evaluating to a table) remains in
effect only within the scope of the specified block. One can even argue
that any form of setglobals should be replaced by specified blocks (mainly
because it maintains balanced globals substitution.) Then only a
getglobals() call needs to remain. Example:
local CGI = {}
-- these will be needed in CGI
local dofile = dofile
local loadfile = loadfile
local print = print
in CGI do
host = "aap"
function init() ... end
-- CGI serves as the table of globals
-- within the script that is run.
dofile "cgi.config"
-- load setup script to run it later
setup = loadfile "cgi.setup"
-- some weird closure example
function host_printer()
return function() print(host) end
end
-- our own host_printer binds to CGI
-- (the current globals)
print_host = host_printer()
end
-- the setup scripts runs with
-- CGI as its environment (but
-- not as its globals!)
CGI.setup()
host = "noot"
CGI.print_host() -- prints "aap"
CGI.host_printer()() -- prints "noot" [!]