Detecting Undefined Variables

lua-users home
wiki

The Question

How can access of [undefined variables] (undeclared variables) be caught in Lua? is a frequently asked question.

Various approaches have been used as described below. These approaches differ in terms of when and how access to undefined global variables are detected. First, let's consider the nature of the problem...

The Problem

In Lua programs, typos in variable names can be hard to spot because, in general, Lua will not complain that a variable is undefined. For example, consider this program that defines two functions:

function f(x) print(X) end
function g(x) print(X + 1) end

Lua gives no error when loading this code. The first two lines might be wrong (e.g. "x" mistyped as "X") or it might not be (maybe X is some other global variable). In fact, Lua has no way of knowing if the code is wrong. The reason is that if a variable is not recognized by Lua as a local variable (e.g. by static declaration of the variable using a "local" keyword or function parameter definition), the variable is instead interpreted as a global variable (as is the case for "X"). Now, whether a global variable is defined is not as easy to determine or describe. X has the value t['X'] where t = getfenv() is the "environment table" of the currently running function. X always has a value, though it is probably nil if X was a typo. We might interpret X being nil as X being undefined, but whether X is nil can only be determined at run-time. For example:

        -- X is "undefined"
f(X)    -- print nil
X = 2   -- X is defined
f(X)    -- prints 2
X = nil -- X is "undefined" again
f(X)    -- prints nil

Even the above runs without error. When X is nil, print(X) becomes print(nil), and it is valid to print a nil value. However, consider calling the function g:

g(X)

This fails with the error "attempt to perform arithmetic on global 'X' (a nil value)". The reason is that print(X + 1) becomes print(nil + 1), and it is invalid to add nil to a number. The error is not observed, however, until the code nil + 1 actually executes.

Obviously, we may want to detect undefined global variables more proactively, such as detecting them at compile time or at least prior to production release (e.g. inside a test suite). The following methods have been devised.

Approach #1: Run-time Checking

Reads from and writes to undefined globals can be detected when they happen, at run-time. These approaches operate by overriding the __index and __newindex metamethods in the environment table of the currently running function. Lua sends reads and writes to undefined global variables to these metamethods that in turn can be programmed to raise run-time errors.

This approach is taken by the "strict" module in the Lua distribution (etc/strict.lua (downloads for [Lua 5.1] and [Lua 5.2]). Alternately, see [LuaStrict] by ThomasLauer for an extension of the strict approach.

Here are some advantages and disadvantages of this approach:

Advantages:

Disadvantages:

The below was moved from EnforcingLocalDeclaration

The code below written by Niklas Frykholm was found in the Lua mail archive. I thought it would nice to document it in the wiki as gems like this can be easily lost or forgotten amongst the hundreds of mails. The concept about enforcing local variable declaration is to stop yourself from using a variable that hasn't been declared. This in effect also stops you from accidentally using an undeclared variable that was meant to be local in scope but gets treated as global which can come back and haunt you while debugging.

SR - Could you explain what this solution offers that DetectingUndefinedVariables does not? Are you aware of etc/strict.lua, but believe this approach to be better?

There are many effective solution to enforcing variable declaration, however, personally I have found Niklas Frykholm solution to be most elegant and unintrusive (also hardly a hit on performance as most variables declared in programs are local scope and the code only gets hit when declaring global variables).

Basically anytime you call GLOBAL_lock(_G) (note the _G is for the global variables table) somewhere in your code, from that point onwards anytime you try to use a variable without explicitly declaring it as 'local' Lua will return an error.

I have made a slight modification to the code to enable the convenience for one to also explicitly allow global declarations by prefixing variables with double underscore (eg. __name, __global_count), however you may choose to change the code for another naming method to suit your own taste (eg G_name, G_global_count). (Question from a reader: does this on-the-fly declaration of global variables prefixed with "__" not once again enable typos - i.e. setting __valueX and __valueX are both accepted as legal, kind of defying (a large part of) the original idea?)


--===================================================
--=  Niklas Frykholm 
-- basically if user tries to create global variable
-- the system will not let them!!
-- call GLOBAL_lock(_G)
--
--===================================================
function GLOBAL_lock(t)
  local mt = getmetatable(t) or {}
  mt.__newindex = lock_new_index
  setmetatable(t, mt)
end

--===================================================
-- call GLOBAL_unlock(_G)
-- to change things back to normal.
--===================================================
function GLOBAL_unlock(t)
  local mt = getmetatable(t) or {}
  mt.__newindex = unlock_new_index
  setmetatable(t, mt)
end

function lock_new_index(t, k, v)
  if (k~="_" and string.sub(k,1,2) ~= "__") then
    GLOBAL_unlock(_G)
    error("GLOBALS are locked -- " .. k ..
          " must be declared local or prefix with '__' for globals.", 2)
  else
    rawset(t, k, v)
  end
end

function unlock_new_index(t, k, v)
  rawset(t, k, v)
end

--SamLie?

Approach #2: Static Analysis (Compile-time Checking)

An alternative method is to detect undefined globals at compile time. Of course, Lua can be used as an interpreted language without an explicit compilation step (though internally it does compile to bytecode). What we mean by this, however, is that undefined globals are detected before the code executes as normal. It can be done without really executing all the code but rather only parsing it. This is sometimes called "static analysis" of source code.

To detect these at compile time you may (under a *nix-like operating system) use the following command-line trick with the Lua compiler (luac):

luac -p -l myprogram.lua | grep ETGLOBAL

For Lua 5.2:

luac -p -l myprogram.lua | grep 'ETTABUP.*_ENV'

This lists all gets and sets to global variables (both defined and undefined ones). You may find that some gets/sets are interpreted as globals when you really wanted them to be locals (missing "local" statement or misspelling variable name). The above approach works well if you follow a coding style of "avoiding globals like the plague" (i.e. using locals (lexicals) whenever possible).

An extension to this approach is in tests/globals.lua in the Lua 5.1.2 distribution, which implements the *nix pipe " | grep ETGLOBAL" instead in Lua and does so more effectively by filtering out pre-defined globals (e.g. print, math, string, etc.). See also LuaList:2006-05/msg00306.html, as well as LuaLint. Also see Egil Hjelmeland's [globals]. A more advanced version of globals.lua is [globalsplus.lua] (DavidManura), which looks in fields of global tables too. A yet more advanced bytecode analysis is done in [lglob] [3] (SteveDonovan).

An external "linter" tool or semantically aware text editor (like [Lua for IntelliJ IDEA], LuaInspect, the older LuaFish, or the Metalua code below) that parses and statically analyzes Lua code can achieve a similar effect, as well as detecting other classes of coding errors or questionable coding practices. For example, LuaFish (which is fairly experimental) can even detect that string:length() or math.cos("hello") are invalid.

[Lua Checker] (5.1) is one such tool, which analyzes Lua source for common programming errors, much as the "lint" program does for C. It contains a Lua 5.1 bison parser.

love-studio [OptionalTypeSystem] allows type annotations in regular Lua comments:

-- this is a description
-- @param(a : number) some parameter 
-- @ret(number) first return value
-- @ret(string) second return value
function Thing:Method(a)
        return 3,"blarg"
end

--@var(number) The x coordinate
--@var(number) The y coordinate
local x,y = 0,0

It is described as an "optional type system (as defined by Gilad Bracha in his paper Pluggable Type Systems) is a type system that a.) has no effect on the run-time semantics of the programming language, and b.) does not mandate type annotations in the syntax."

Another approach is to patch the Lua parser itself. See LuaList:2006-10/msg00206.html for such an example.

Note: modify lparser.c:singlevar as follows for more correct error handling: --DavidManura
/* based on 5.1.4 */
static void singlevar (LexState *ls, expdesc *var) {
  TString *varname;
  FuncState *fs;
  check(ls, TK_NAME);
  varname = ls->t.seminfo.ts;
  fs = ls->fs;
  singlevaraux(fs, varname, var, 1);
  luaX_next(ls);
  /* luaX_next should occur after any luaX_syntaxerror */
}

Here are some advantages and disadvantages of this approach:

Advantages:

Disadvantages:

A Lua Lint Tool

The following utility will lint Lua source code, detecting undefined variables (and could be expanded to do other interesting things).

-- lint.lua - A lua linter.
--
-- Warning: In a work in progress.  Not currently well tested.
--
-- This relies on Metalua 0.2 ( http://metalua.luaforge.net/ )
-- libraries (but doesn't need to run under Metalua).
-- The metalua parsing is a bit slow, but does the job well.
--
-- Usage:
--   lua lint.lua myfile.lua
--
-- Features:
--   - Outputs list of undefined variables used.
--     (note: this works well for locals, but globals requires
--      some guessing)
--   - TODO: add other lint stuff.
--
-- David Manura, 2007-03
-- Licensed under the same terms as Lua itself.

-- Capture default list of globals.
local globals = {}; for k,v in pairs(_G) do globals[k] = "global" end

-- Metalua imports
require "mlp_stat"
require "mstd"  --debug
require "disp"  --debug

local filename = assert(arg[1])

-- Load source.
local fh = assert(io.open(filename))
local source = fh:read("*a")
fh:close()

-- Convert source to AST (syntax tree).
local c = mlp.block(mll.new(source))

--Display AST.
--print(tostringv(c))
--print(disp.ast(c))
--print("---")
--for k,v in pairs(c) do print(k,disp.ast(v)) end

-- Helper function: Parse current node in AST recursively.
function traverse(ast, scope, level)
  level = level or 1
  scope = scope or {}

  local blockrecurse

  if ast.tag == "Local" or ast.tag == "Localrec" then
    local vnames, vvalues = ast[1], ast[2]
    for i,v in ipairs(vnames) do
      assert(v.tag == "Id")
      local vname = v[1]
      --print(level, "deflocal",v[1])
      local parentscope = getmetatable(scope).__index
      parentscope[vname] = "local"
    end
    blockrecurse = 1
  elseif ast.tag == "Id" then
    local vname = ast[1]
    --print(level, "ref", vname, scope[vname])
    if not scope[vname] then
      print(string.format("undefined %s at line %d", vname, ast.line))
    end
  elseif ast.tag == "Function" then
    local params = ast[1]
    local body = ast[2]
    for i,v in ipairs(params) do
      local vname = v[1]
      assert(v.tag == "Id" or v.tag == "Dots")
      if v.tag == "Id" then
        scope[vname] = "local"
      end
    end
    blockrecurse = 1
  elseif ast.tag == "Let" then
    local vnames, vvalues = ast[1], ast[2]
    for i,v in ipairs(vnames) do
      local vname = v[1]
      local parentscope = getmetatable(scope).__index
      parentscope[vname] = "global" -- note: imperfect
    end
    blockrecurse = 1
  elseif ast.tag == "Fornum" then
    local vname = ast[1][1]
    scope[vname] = "local"
    blockrecurse = 1
  elseif ast.tag == "Forin" then
    local vnames = ast[1]
    for i,v in ipairs(vnames) do
      local vname = v[1]
      scope[vname] = "local"
    end
    blockrecurse = 1
  end

  -- recurse (depth-first search through AST)
  for i,v in ipairs(ast) do
    if i ~= blockrecurse and type(v) == "table" then
      local scope = setmetatable({}, {__index = scope})
      traverse(v, scope, level+1)
    end
  end
end

-- Default list of defined variables.
local scope = setmetatable({}, {__index = globals})

traverse(c, scope) -- Start check.

Example:

-- test1.lua
local y = 5
local function test(x)
  print("123",x,y,z)
end

local factorial
function factorial(n)
  return n == 1 and 1 or n * factorial(n-1)
end

g = function(w) return w*2 end

for k=1,2 do print(k) end

for k,v in pairs{1,2} do print(v) end

test(2)
print(g(2))

Output:

$ lua lint.lua test1.lua
undefined z at line 4

A much more extensive version is in LuaInspect. Another more Metalua-ish (and possibly better) Metalua implementation given by Fabien is in [1], and and even simpler one is below. See also MetaLua info.

Something similar could be down using other Lua parsers (see LuaGrammar and in particular LpegRecipes), such as Leg [2].

Another Metalua solution

This piece of Metalua code uses the standard walker libraries to print a list of all global variables used in the program where it's inserted:

-{ block:
   require 'walk.id' -- Load scope-aware walker library
   -- This function lists all the free variables used in `ast'
   function list_globals (ast)
      -- Free variable names will be accumulated as keys in table `globals'
      local walk_cfg, globals = { id = { } }, { }
      function walk_cfg.id.free(v) globals[v[1]] = true end
      walk_id.block(walk_cfg, ast)
            -- accumulate global var names in the table "globals"
      print "Global vars used in this chunk:"
      for v in keys(globals) do print(" - "..v) end
   end
   -- Hook the globals lister after the generation of a chunk's AST:
   mlp.chunk.transformers:add(list_globals) }

--FabienFleutot

Another Metalua solution: Metalint

"Metalint [4] is a utility that checks Lua and Metalua source files for global variables usage. Beyond checking toplevel global variables, it also checks fields in modules: for instance, it will catch typos such as taable.insert(), both also table.iinsert(). Metalint works with declaration files, which list which globals are declared, and what can be done with them...." [4]

Approach #3: Mixed Run-time/Compile-time Approach

Hybrid approaches are possible. Note that detection of global variable accesses (at least direct ones not through _G or getfenv()) is best done at compile time, while determination of whether those global variables are defined may best be done at run-time (or possibly, sufficiently so, at "load time", about when loadfile is done). So, a compromise would be to split these two concerns and do them when most appropriate. Such a mixed approach is taken by the ["checkglobals" module+patch], which provides a checkglobals(f, env) function (implemented entirely in Lua). In short, checkglobals validates that the function f (which by default is taken to be the calling function) uses only global variables defined in the table env (which by default is taken to be the environment of f). checkglobals requires a small patch to add an additional 'g' option to the debug library's debug.getinfo / lua_getinfo function to list the global variable accesses lexically inside the function f.

Semantically Aware Editors

See editors/IDE's under ProgramAnalysis for editors that highlight undefined variables. This can be implemented by static analysis and/or by invoking the Lua interpreter. This manner is convenient because any errors are immediately displayed in context on the screen without invoking any external build tool and browsing through its output.

Lua Syntax Extensions

A few syntax extensions have been proposed to handle undefined variables more automatically by the Lua compiler:

Historical: Old Lua 4 Notes

Here's a quick and crude solution to prevent assignment to undefined globals, in Lua 4.0:

function undefed_global(varname, newvalue)
  error("assignment to undefined global " .. varname)
end

function guard_globals()
  settagmethod(tag(nil), "setglobal", undefed_global)
end

Once guard_globals() has been called, any assignment to a global with a nil value will generate an error. So typically you would call guard_globals() after you've loaded your scripts, and before you run them. For example:

SomeVariable = 0

function ClearVariable()
  SomeVariabl = 1      -- typo here
end

-- now demonstrate that we catch the typo
guard_globals()

ClearVariable()        -- generates an error at the typo line

The "getglobal" tag method can similarly be used to catch reads of undefined globals. Also, with more code, a separate table can be used to distinguish between "defined" globals that happen to have a nil value, and "undefined" globals which have never been accessed before.

See Also


RecentChanges · preferences
edit · history
Last edited October 4, 2014 10:59 pm GMT (diff)