Detecting Undefined Variables |
|
How can access of [undefined variables] (undeclared variables) be caught in Lua? is a frequently asked question.
Various approaches have been used as described below. These approaches differ in terms of when and how access to undefined global variables are detected. First, let's consider the nature of the problem...
In Lua programs, typos in variable names can be hard to spot because, in general, Lua will not complain that a variable is undefined. For example, consider this program that defines two functions:
function f(x) print(X) end function g(x) print(X + 1) end
Lua gives no error when loading this code. The first two lines
might be wrong (e.g. "x" mistyped as "X") or it might not be (maybe
X is some other global variable). In fact, Lua has no way of
knowing if the code is wrong. The reason is that if a variable is
not recognized by Lua as a local variable (e.g. by static
declaration of the variable using a "local" keyword or function parameter definition), the variable is instead
interpreted as a global variable (as is the case for "X"). Now,
whether a global variable is defined is not as easy to determine or
describe. X has the value t['X'] where t = getfenv() is the "environment table"
of the currently running function. X always has a value, though it
is probably nil if X was a typo. We might interpret X being nil as
X being undefined, but whether X is nil can only be determined at
run-time. For example:
-- X is "undefined"
f(10) -- print nil
X = 2 -- X is defined
f(10) -- prints 2
X = nil -- X is "undefined" again
f(10) -- prints nil
Even the above runs without error. When X is nil, print(X) becomes
print(nil), and it is valid to print a nil value. However, consider
calling the function g:
g(10)
This fails with the error "attempt to perform arithmetic on global 'X' (a nil value)". The reason is that print(X + 1) becomes
print(nil + 1), and it is invalid to add nil to a number. The error
is not observed, however, until the code nil + 1 actually executes.
Obviously, we may want to detect undefined global variables more proactively, such as detecting them at compile time or at least prior to production release (e.g. inside a test suite). The following methods have been devised.
Reads from and writes to undefined globals can be detected when they happen, at run-time. These approaches operate by overriding the __index and __newindex metamethods in the environment table of the currently running function. Lua sends reads and writes to undefined global variables to these metamethods that in turn can be programmed to raise run-time errors.
This approach is taken by the "strict" module in the Lua 5.1 distribution (etc/strict.lua). Alternately, see [LuaStrict] by ThomasLauer for an extension of the strict approach.
Here are some advantages and disadvantages of this approach:
Advantages:
Disadvantages:
An alternative method is to detect undefined globals at compile time. Of course, Lua can be used as an interpreted language without an explicit compilation step (though internally it does compile to bytecode). What we mean by this, however, is that undefined globals are detected before the code executes as normal. It can be done without really executing all the code but rather only parsing it. This is sometimes called "static analysis" of source code.
To detect these at compile time you may (under a *nix-like operating system) use the following command-line trick with the Lua compiler (luac):
This lists all gets and sets to global variables (both defined and undefined ones). You may find that some gets/sets are interpreted as globals when you really wanted them to be locals (missing "local" statement or misspelling variable name). The above approach works well if you follow a coding style of "avoiding globals like the plague" (i.e. using locals (lexicals) whenever possible).
An extension to this approach is in tests/globals.lua in the Lua 5.1.2 distribution, which implements the *nix pipe " | grep ETGLOBAL" instead in Lua and does so more effectively by filtering out pre-defined globals (e.g. print, math, string, etc.). See also LuaList:2006-05/msg00306.html, as well as LuaLint.
An external "linter" tool (see Metalua-based solutions below and LuaFish) that parses and statically analyzes Lua code can achieve a similar effect and also be used for detecting other classes of coding errors or questionable coding practices. For example, LuaFish (which is fairly experimental) can even detect that string:length() or math.cos("hello") are invalid.
[Lua Checker] (5.1) is one such tool, which analyzes Lua source for common programming errors, much as the "lint" program does for C. It contains a Lua 5.1 bison parser.
Another approach is to patch the Lua parser itself. See LuaList:2006-10/msg00206.html for such an example.
/* based on 5.1.4 */
static void singlevar (LexState *ls, expdesc *var) {
TString *varname;
FuncState *fs;
check(ls, TK_NAME);
varname = ls->t.seminfo.ts;
fs = ls->fs;
singlevaraux(fs, varname, var, 1);
luaX_next(ls);
/* luaX_next should occur after any luaX_syntaxerror */
}
Here are some advantages and disadvantages of this approach:
Advantages:
Disadvantages:
Hybrid approaches are possible. Note that detection of global variable accesses (at least direct ones not through _G or getfenv()) is best done at compile time, while determination of whether those global variables are defined may best be done at run-time (or possibly, sufficiently so, at "load time", about when loadfile is done). So, a good compromise may be to split these two concerns and do them when most appropriate.
A mixed approach is taken by the "checkglobals" module+patch (see below for details), which provides a checkglobals(f, env) function (implemented entirely in Lua). In short, checkglobals validates that the function f (which by default is taken to be the calling function) uses only global variables defined in the table env (which by default is taken to be the environment of f).
checkglobals requires a small patch to add an additional 'g' option to the debug library's
debug.getinfo / lua_getinfo function to list the global variable accesses lexically inside the
function f.
The checkglobals module + patch (from LuaPowerPatches -- [Download Patch for Lua 5.1.3]) is a hybrid of a compile-time and run-time approach for detecting undefined variables. Consider the following trivial Lua module:
-- multiplybyx.lua local function multiplybyx(y) return y * X -- is X defined??? end return multiplybyx
Is this code valid? Did we mistype x as X? Well, we can detect at compile time that X is a global variable, but whether X is a defined global variable can in general not be known until run-time:
-- main.lua local multiplybyx = dofile 'multiplybyx.lua' X = 2 print(multiplybyx(5)) -- multiplybyx is valid X = nil print(multiplybyx(5)) -- multiplybyx is now not valid
So, we'll define a function checkglobals that determines whether all the globals "directly" referenced lexically inside the code of a given function (e.g. multiplybyx) are defined at the time checkglobals is called:
-- main.lua local checkglobals = require 'checkglobals' local multiplybyx = dofile 'multiplybyx.lua' X = 2 checkglobals(multiplybyx) -- ok: multiplybyx is valid print(multiplybyx(5)) X = nil checkglobals(multiplybyx) -- fails: multiplybyx is not valid print(multiplybyx(5))
$ lua main.lua
10
lua: main.lua:8: accessed undefined variable "X" at line 3
stack traceback:
[C]: in function 'error'
etc/checkglobals.lua:77: in function 'checkglobals'
main.lua:8: in main chunk
[C]: ?
The function checkglobals(f) operates by retrieving the environment table (env) (known at run-time) of function f and retrieving the list of all global get and set bytecodes (GETGLOBAL and SETGLOBAL) lexically inside f (known at compile-time). checkglobals verifies that for each get or set global with name varname that env[varname] ~= nil. If this check fails, checkglobals raises an error. Unless the code was stripped, i.e. luac -s, the error also contains the line number in which the global variable was accessed.
The checkglobals function accepts some additional parameters that make it more flexible. Let's look at the comments in checkglobals.lua on it:
-- checkglobals.lua -- Undeclared global variable detection for Lua. -- -- This module consists of and returns a single function: -- -- f = checkglobals(f, env) -- -- In short, checkglobals validates that the function f uses only -- global variables defined in the table env. -- -- Often, checkglobals() is called without arguments. If f is -- unspecified (nil), the calling function is used. If f is a number, -- the function at stack level f is used (1 is the calling function). -- If env is unspecified (nil), the environment of the calling -- function is used. -- -- The test passes only if all global variables "directly" read from -- or written to lexically inside the function f (including functions -- lexically nested in f) exist in the table env. That is, -- env[varname] ~= nil for variable with name varname. Access to -- globals "indirectly" via _G or getfenv() don't count. -- -- On success, returns f. On failure, raises error. The error -- contains a line number unless the source was stripped (luac -s). -- -- This module requires a patched version of Lua that makes minor -- additions to ldebug.c (lua_getinfo 'g' option) and ldblib.c -- (debug.getinfo 'globals' field). Internally, it retrieves -- GETGLOBAL and SETGLOBAL bytecodes. -- -- This module can be used in various ways including... -- -- Usage mode #1: Define globals, then check. -- -- foo.lua -- x = 1 -- function foo() x = x + 1; print(x) end -- function bar() X = X + 1; print(X) end -- opps! -- foo() -- require 'checkglobals' () -- -- Usage mode #2: Check, then define only locals. -- -- foo.lua -- require 'checkglobals' () -- local x = 1 -- local function foo() x = x + 1; print(x) end -- local function bar() X = X + 1; print(X) end -- opps! -- foo() -- -- Usage mode #3: Check specified function. -- -- foo.lua -- local checkglobals = require 'checkglobals' -- function foo() -- print(mAtH.pi) -- opps! -- end -- checkglobals(foo) -- foo() -- -- David Manura, 2008. Licensed under the same terms as Lua itself -- (MIT License).
The implementation of this module (on the Lua side) is this:
-- copy in case a sandbox removes these local getinfo = debug.getinfo local unpack = unpack local type = type local getfenv = getfenv local error = error local function checkglobals(f, env) local fp = f or 1 if type(fp) == 'number' then fp = fp + 1 end env = env or getfenv(2) local gref = getinfo(fp, 'g').globals for i=1,#gref,gref.ncols do local op,name,linenum = unpack(gref, i,i+2) if env[name] == nil then error('accessed undefined variable "' .. name .. '"' .. (linenum and ' at line ' .. linenum or ''), 2) end end return f end checkglobals() -- check oneself :) return checkglobals
This code makes use of a patched debug.getinfo that supports a new "g" ("globals") option that returns the list of all globals accessed lexically inside the given function (including functions lexically nested inside that function). gref = getinfo(fp, 'g').globals is an array. For each global accessed, the following values are appended to the array: the access type ("GETGLOBAL" or "SETGLOBAL"), the variable name (as a string), and the line number (if source was not stripped). There is also a field gref.ncols equal to the number of columns (2 or 3) represented in the flat array.
Below are some examples of possible ways to use the module:
Example:
-- factorial.lua function factorial(k) if k == 1 then return K -- opps! else return k * factorial(k-1) end end function main() print(factorial(10)) end require 'checkglobals' () -- fails since K is undefined main()
Example:
-- factorial.lua require 'checkglobals' () -- fails since K is undefined -- note: no new globals can be "directly" defined beyond this point -- (though via _G and getfenv() is ok). local function factorial(k) if k == 1 then return K -- opps! else return k * factorial(k-1) end end local function main() print(factorial(10)) end main()
Example:
-- factorial.lua local M = {} local function factorial(k) if k == 1 then return K -- opps! else return k * factorial(k-1) end end M.factorial = factorial require 'checkglobals' () return M
Below is the patch made to Lua's debugging module:
diff -urN lua-5.1.3/src/ldblib.c lua-5.1.3-checkglobals/src/ldblib.c --- lua-5.1.3/src/ldblib.c 2008-01-21 08:11:21.000000000 -0500 +++ lua-5.1.3-checkglobals/src/ldblib.c 2008-03-22 21:56:56.227500000 -0400 @@ -136,6 +136,11 @@ treatstackoption(L, L1, "activelines"); if (strchr(options, 'f')) treatstackoption(L, L1, "func"); + + /* PATCH - checkglobals */ + if (strchr(options, 'g')) + treatstackoption(L, L1, "globals"); + return 1; /* return table */ } diff -urN lua-5.1.3/src/ldebug.c lua-5.1.3-checkglobals/src/ldebug.c --- lua-5.1.3/src/ldebug.c 2007-12-28 10:32:23.000000000 -0500 +++ lua-5.1.3-checkglobals/src/ldebug.c 2008-03-22 21:59:35.399375000 -0400 @@ -219,6 +219,10 @@ } break; } + + /* PATCH - checkglobals */ + case 'g': + case 'L': case 'f': /* handled by lua_getinfo */ break; @@ -229,6 +233,31 @@ } +/* PATCH - checkglobals */ +static void auxgetinfoglobals(lua_State *L, Proto *p, Table *t, int *c) { + TValue *k = p->k; + int j; + for (j = 0; j < p->sizecode; j++) { + const Instruction i = p->code[j]; + OpCode op = GET_OPCODE(i); + const TValue *ts; + if (op != OP_GETGLOBAL && op != OP_SETGLOBAL) + continue; + ts = k+GETARG_Bx(i); + lua_assert(ttisstring(ts)); + setobj2t(L, luaH_setnum(L, t, (*c)++), + (L->top - (OP_GETGLOBAL ? 2 : 1))) + setobj2t(L, luaH_setnum(L, t, (*c)++), ts); + if(p->lineinfo) { + setnvalue(luaH_setnum(L, t, (*c)++), p->lineinfo[j]); + } + } + for (j = 0; j < p->sizep; j++) { /* lexically nested functions */ + auxgetinfoglobals(L, p->p[j], t, c); + } +} + + LUA_API int lua_getinfo (lua_State *L, const char *what, lua_Debug *ar) { int status; Closure *f = NULL; @@ -254,6 +283,22 @@ } if (strchr(what, 'L')) collectvalidlines(L, f); + + /* PATCH - checkglobals */ + if (strchr(what, 'g')) { + lua_newtable(L); + if (f != NULL || !f->c.isC) { + Table *t = hvalue(L->top-1); + int c = 1; + lua_pushnumber(L, f->l.p->lineinfo ? 3 : 2); + lua_setfield(L, -2, "ncols"); + lua_pushliteral(L, "GETGLOBAL"); + lua_pushliteral(L, "SETGLOBAL"); + auxgetinfoglobals(L, f->l.p, t, &c); + lua_pop(L, 2); + } + } + lua_unlock(L); return status; }
The patch is rather simple and quite isolated. It only makes additions (no deletions) to lua_getinfo and debug.getinfo to support the new "g" ("globals") option.
The new "g" option may have uses elsewhere, so this might be a useful addition to Lua's debug module. The list of globals that a function accesses can be considered part of the function's interface, which is a very fundamental aspect of what the function is. reflection/introspection is much about accessing information on interfaces.
This "g" option may alternately be defined in terms of lhf's bytecode inspector library (lbci)[1]:
local getinstruction = inspector.getinstruction local getfunction = inspector.getfunction local getconstant = inspector.getconstant local inf = math.huge local type = type local function auxgetglobals(f, gref) for i=1,inf do local linenum,op,_,idx = getinstruction(f, i) if not op then break end if op == 'GETGLOBAL' or op == 'SETGLOBAL' then local name = getconstant(f, -idx) gref[#gref+1] = op gref[#gref+1] = name gref[#gref+1] = linenum -- may be nil end end for i=1,inf do local f2 = getfunction(f,i) if not f2 then break end auxgetglobals(f2, gref) end end local function getglobals(f) local gref = {} auxgetglobals(f, gref) local haslines = type(gref[3]) == 'number' gref.ncols = haslines and 3 or 2 return gref end local orig_getinfo = debug.getinfo function debug.getinfo(a,b,c) local thread,f,what if type(a) == 'thread' then thread,f,what = a,b,c else f,what = a,b end if type(f) == 'number' then f = f + 1 end local globals if what and what:find 'g' then what = what:gsub('g', '') local fp = type(f) == 'number' and orig_getinfo(f, 'f').func or f globals = getglobals(fp) end local t = thread and orig_getinfo(thread, f, what) or orig_getinfo( f, what) t.globals = globals return t end
What are the advantages/disadvantages/caveats to checkglobals? Here are some qualities of it:
checkglobals is called and the function that was validated is called. Note that you may call checkglobals more than once (e.g. after creating new globals).
checkglobals approach may be combined with the strict approach for the strongest validation.
lua_getinfo and debug.getinfo to support the new "g" ("globals") option used by checkglobals.lua. This patch is entirely backwards compatible and rather isolated and it might be useful for other purposes as well.
checkglobals is written entirely in Lua and can be customized.
checkglobals (like the static analysis approaches) assumes that a function has a single, non-changing environment. It also assumes that lexically nested functions have the same environment as the parent function, although this restriction might be relaxed with an additional parameter that causes checkglobals to ignore lexically nested functions: checkglobals(f,env,'norecurse'); that will also require an extension to the debug.getinfo patch. See LuaList:2008-03/msg00598.html for details.
See also mail list discussion: LuaList:2008-03/msg00440.html .
The following utility will lint Lua source code, detecting undefined variables (and could be expanded to do other interesting things).
-- lint.lua - A lua linter. -- -- Warning: In a work in progress. Not currently well tested. -- -- This relies on Metalua 0.2 ( http://metalua.luaforge.net/ ) -- libraries (but doesn't need to run under Metalua). -- The metalua parsing is a bit slow, but does the job well. -- -- Usage: -- lua lint.lua myfile.lua -- -- Features: -- - Outputs list of undefined variables used. -- (note: this works well for locals, but globals requires -- some guessing) -- - TODO: add other lint stuff. -- -- David Manura, 2007-03 -- Licensed under the same terms as Lua itself. -- Capture default list of globals. local globals = {}; for k,v in pairs(_G) do globals[k] = "global" end -- Metalua imports require "mlp_stat" require "mstd" --debug require "disp" --debug local filename = assert(arg[1]) -- Load source. local fh = assert(io.open(filename)) local source = fh:read("*a") fh:close() -- Convert source to AST (syntax tree). local c = mlp.block(mll.new(source)) --Display AST. --print(tostringv(c)) --print(disp.ast(c)) --print("---") --for k,v in pairs(c) do print(k,disp.ast(v)) end -- Helper function: Parse current node in AST recursively. function traverse(ast, scope, level) level = level or 1 scope = scope or {} local blockrecurse if ast.tag == "Local" or ast.tag == "Localrec" then local vnames, vvalues = ast[1], ast[2] for i,v in ipairs(vnames) do assert(v.tag == "Id") local vname = v[1] --print(level, "deflocal",v[1]) local parentscope = getmetatable(scope).__index parentscope[vname] = "local" end blockrecurse = 1 elseif ast.tag == "Id" then local vname = ast[1] --print(level, "ref", vname, scope[vname]) if not scope[vname] then print(string.format("undefined %s at line %d", vname, ast.line)) end elseif ast.tag == "Function" then local params = ast[1] local body = ast[2] for i,v in ipairs(params) do local vname = v[1] assert(v.tag == "Id" or v.tag == "Dots") if v.tag == "Id" then scope[vname] = "local" end end blockrecurse = 1 elseif ast.tag == "Let" then local vnames, vvalues = ast[1], ast[2] for i,v in ipairs(vnames) do local vname = v[1] local parentscope = getmetatable(scope).__index parentscope[vname] = "global" -- note: imperfect end blockrecurse = 1 elseif ast.tag == "Fornum" then local vname = ast[1][1] scope[vname] = "local" blockrecurse = 1 elseif ast.tag == "Forin" then local vnames = ast[1] for i,v in ipairs(vnames) do local vname = v[1] scope[vname] = "local" end blockrecurse = 1 end -- recurse (depth-first search through AST) for i,v in ipairs(ast) do if i ~= blockrecurse and type(v) == "table" then local scope = setmetatable({}, {__index = scope}) traverse(v, scope, level+1) end end end -- Default list of defined variables. local scope = setmetatable({}, {__index = globals}) traverse(c, scope) -- Start check.
Example:
-- test1.lua local y = 5 local function test(x) print("123",x,y,z) end local factorial function factorial(n) return n == 1 and 1 or n * factorial(n-1) end g = function(w) return w*2 end for k=1,2 do print(k) end for k,v in pairs{1,2} do print(v) end test(2) print(g(2))
Output:
$ lua lint.lua test1.lua undefined z at line 4
Another more Metalua-ish (and possibly better) Metalua implementation given by Fabien is in [2], and and even simpler one is below. See also MetaLua info.
Something similar could be down using other Lua parsers (see LuaGrammar and in particular LpegRecipes). LuaFish and Leg [3] provide alternatives in Lua.
This piece of Metalua code uses the standard walker libraries to print a list of all global variables used in the program where it's inserted:
-{ block:
require 'walk.id' -- Load scope-aware walker library
-- This function lists all the free variables used in `ast'
function list_globals (ast)
-- Free variable names will be accumulated as keys in table `globals'
local walk_cfg, globals = { id = { } }, { }
function walk_cfg.id.free(v) globals[v[1]] = true end
walk_id.block(walk_cfg, ast)
-- accumulate global var names in the table "globals"
print "Global vars used in this chunk:"
for v in keys(globals) do print(" - "..v) end
end
-- Hook the globals lister after the generation of a chunk's AST:
mlp.chunk.transformers:add(list_globals) }
"Metalint [4] is a utility that checks Lua and Metalua source files for global variables usage. Beyond checking toplevel global variables, it also checks fields in modules: for instance, it will catch typos such as taable.insert(), both also table.iinsert(). Metalint works with declaration files, which list which globals are declared, and what can be done with them...." [4]
LocalDeclarationThe code below written by Niklas Frykholm was found in the Lua mail archive. I thought it would nice to document it in the wiki as gems like this can be easily lost or forgotten amongst the hundreds of mails. The concept about enforcing local variable declaration is to stop yourself from using a variable that hasn't been declared. This in effect also stops you from accidentally using an undeclared variable that was meant to be local in scope but gets treated as global which can come back and haunt you while debugging.
There are many effective solution to enforcing variable declaration, however, personally I have found Niklas Frykholm solution to be most elegant and unintrusive (also hardly a hit on performance as most variables declared in programs are local scope and the code only gets hit when declaring global variables).
Basically anytime you call GLOBAL_lock(_G) (note the _G is for the global variables table) somewhere in your code,
from that point onwards anytime you try to use a variable without explicitly declaring it as 'local'
Lua will return an error.
I have made a slight modification to the code to enable the convenience for one to also explicitly allow global declarations
by prefixing variables with double underscore (eg. __name, __global_count), however you may choose to change the
code for another naming method to suit your own taste (eg G_name, G_global_count).
--=================================================== --= Niklas Frykholm -- basically if user tries to create global variable -- the system will not let them!! -- call GLOBAL_lock(_G) -- --=================================================== function GLOBAL_lock(t) local mt = getmetatable(t) or {} mt.__newindex = lock_new_index setmetatable(t, mt) end --=================================================== -- call GLOBAL_unlock(_G) -- to change things back to normal. --=================================================== function GLOBAL_unlock(t) local mt = getmetatable(t) or {} mt.__newindex = unlock_new_index setmetatable(t, mt) end function lock_new_index(t, k, v) if (k~="_" and string.sub(k,1,2) ~= "__") then GLOBAL_unlock(_G) error("GLOBALS are locked -- " .. k .. " must be declared local or prefix with '__' for globals.", 2) else rawset(t, k, v) end end function unlock_new_index(t, k, v) rawset(t, k, v) end
--SamLie?
Here's a quick and crude solution to prevent assignment to undefined globals, in Lua 4.0:
function undefed_global(varname, newvalue) error("assignment to undefined global " .. varname) end function guard_globals() settagmethod(tag(nil), "setglobal", undefed_global) end
Once guard_globals() has been called, any assignment to a global with a nil value will generate an error. So typically you would call guard_globals() after you've loaded your scripts, and before you run them. For example:
SomeVariable = 0 function ClearVariable() SomeVariabl = 1 -- typo here end -- now demonstrate that we catch the typo guard_globals() ClearVariable() -- generates an error at the typo line
The "getglobal" tag method can similarly be used to catch reads of undefined globals. Also, with more code, a separate table can be used to distinguish between "defined" globals that happen to have a nil value, and "undefined" globals which have never been accessed before.