[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Globals (more ruminations)
- From: David Manura <dm.lua@...>
- Date: Fri, 9 Jul 2010 01:57:06 -0400
On Thu, Jul 8, 2010 at 12:45 PM, Mark Hamburg <email@example.com> wrote:
> Why are globals "bad"? ... Why are globals "good"? ...
Ok, so when do I think globals are appropriate? The first and primary
case is when retrieving variables from the standard library:
Here's why I think this is acceptable:
(1) Typos can be statically detected here. This is often done via
the "luac -p -l file.lua | grep ETGLOBAL" technique (e.g.
globals.lua), by flagging all gets absent in a whitelist and flagging
all sets. That whitelist can be defined statically, such as from the
Lua Reference Manual, or by dynamically querying _G. _G has the
benefit of automatically handling any custom objects you add to the
standard library, but it requires that the Lua state be clean (e.g.
calls to `module` pollute _G), so it is best run from a separate Lua
process or using some table other than the current _G. Typos to
members and signatures (e.g. "math.sqrrt(2,true)") are not detected in
the above approach. Although we can slightly extend the approach by
localizing `math_sqrt`, a more general solution, which avoids
rewriting variables in your code, is to write a more intelligent
static analyzer, like the direction in luaanalyze. So, it's just a
matter of how to make static checking more powerful and accessible,
making its use the norm rather than the exception.
(2) Optimization is best left to the compiler. Globals have the
performance impact of table indexing (sometimes multiple indexes).
For most cases this is acceptable. For other cases, renaming
variables may increase performance, so it's tempting to do so. For
example, if I'm creating a standard library with a "string trim"
function  that other people may use in unknown ways, and if I know
aggressive localizing can in certain cases have some measurable
impact, I may play it safe and localize even though it may slightly
uglify the code. However, my preference, since this transformation
can be performed mechanically, is to keep the code clean and leave
optimization to the compiler, either assuming LuaJIT or writing some
preprocessor  or patch  that would perform this optimization
without bothering the original source.
(3) Localizing every top level function (e.g. print) is cumbersome
to do manually. Although I can localize with `local _ENV = require
"_G"` and then access `_ENV.print` or even `print`, this doesn't
automatically define a local for each top-level function in _G. Lua
doesn't have a "import static foo.*" like Java .
(4) It can work ok if custom environments are implemented carefully
or avoided. If the current environment is changed, then the standard
library functions must remain exposed through the new environment.
Localizing these functions is one way to achieve. It can be
preferable to expose them through a fallback (e.g. __index to _G) in
the current environment so that access works normally. This is ok on
paper. The problem occurs if you attempt, like package.seeall, to
reuse the environment table in some place where these standard library
functions should not be exposed. It takes some looking, but there are
ways to solve this like package.clean  (with some caveats about
caching--bar using a proxy table or __setindex--in the probably rare
case of redefinition in the public namespace). Another solution,
which I usually do and which is a very simple avoidance of these
problems, is to use a separate local table (e.g. M) for the module's
public namespace. You may still use a local environment in the latter
solution, but it becomes mostly superfluous. BTW, the Lua 5.2 VM
makes "M.foo", "_ENV.foo" and "foo" equally efficient.
(5) Localization is sometimes used to bind globals earlier, making
the module more immune to changes to external global variables
following module load, but I question whether this is the right
approach. This technique is utilized in Lua 5.1.4 strict.lua  to
support sandboxing. I suspect maybe this should not be the module's
responsibility but rather that of the loader (e.g. `loadin` with
custom environment table).
Another case mentioned where globals are appropriate is the
interactive interpreter. However, I don't think globals are necessary
unavoidable here. There are times I've wanted this to just work
(without an enclosing "do" block):
> local x = 1
> local y = x + 1
Mark suggested some ideas to make this work . The basic idea is to
bind the locals declared in previous chunks into the current chunk,
almost as if the current chunk is lexically nested inside and at the
bottom of the previous chunk. This is likely implementable by
patching lua.c, using either source rewriting or internal
Globals can also be useful in DSLs to avoid inserting `local`,
`return`, and `...` throughout in the DSL (though it can still remain
a difficult fit ).
Now, I think globals get messier when you have custom environments, or
even multiple environments in the same file:
_ENV = module(...)
local tostring = tostring
return class('bar', function(_ENV)
function test() ..... end
Here, a nested environment is *trying to* reference a `foo` in the
parent environment and a `print` in the global _G table. Moreover,
the method name `tostring` conflicts with a local `tostring` in the
parent. There are ways to address this like aliasing the top-level
_ENV to a local of another name to permit disambiguation, but I think
the underlying problem is that we're trying to use some hacks with
globals to mimic lexical scopes in our custom language constructs
(module and class), but Lua's global resolution rules such as "locals
override globals" and "_ENV tables are not recursively queried up the
lexical nesting levels" may complicate making this work seamlessly.
Add to this the concerns in LuaModuleFunctionCritiqued .
Additionally, the function "test" is forward declared, which
unfortunately makes the definition misleadingly look like a global
definition. This mess, and numerous ways to shoot yourself in the
foot, suggests rewriting everything above without environments/globals
but rather a straightforward lexical scoping solution that "just