Table Scope

lua-users home
wiki

Abstract

Described here are various solutions to allow table constructs to enclose a scope so that variables used inside the table have special meaning. In other words, we want something like this:

local obj = struct "foo" {
              int "bar";
              string "baz";
            }
assert(obj == "foo[int[bar],string[baz]]")
assert(string.lower("ABC") == "abc")
assert(int == nil)

Problem

Lua is good for data definition (as discussed in Programming in Lua):

local o = rectangle { point(0,0), point(3,4) }

-- another example:
html{ h1"This is a header", p"This is a paragraph",
      p{"This is a ", em"styled", " paragraph"} }

-- another example:
struct "name" {
  int    "bar";
  string "baz";
}

The syntax is nice. The semantics are less so: string is a global variable scoped outside of struct, even though it only needs to have meaning inside of struct. This can create problems. For example, the above requires redefining the standard table string.

Requiring a prefix will solve the problem but can be cumbersome and ugly, being foreign to the problem domain that data definition intends to describe:

local struct = require "struct"
...
local o = struct "name" {
            struct.int  "bar";
            struct.string "baz";
          }

We could restrict the scope with locals, but defining them can become cumbersome too, especially if your data definition language contains hundreds of tags:

local struct = require "struct"
...
local o; do
  local int = struct.int
  local string = struct.string
  o = struct "name" {
        int  "bar";
        string "baz";
      }
end

In fact, we might want some word to have different meanings depending on nesting context:

action { point { at = location { point(3,4) } }

An alternative might be

struct "name" {
  int = "bar";
  string = "baz";
}

in which case int and string are now strings rather than global variables. However, here the ordering and multiplicity of the arguments is lost (which are important in structs). Another alternative is

struct "name" .
  int "bar" .
  string "baz" .
endstruct

Semantically, that's better (except one must not forget the endstruct), but the dot syntax is somewhat unusual. Semantically we might want something that is like an S-expression:

struct {
  "name",
  {"int", "bar"},
  {"string", "baz"}
}

but syntactically that is lacking.

Solution: Table Scope Patch for Lua

There is a "table scope" patch that allows this type of thing:

local function struct(name)
  local scope = {
    int = function(name) return "int[" .. name .. "]" end;
    string = function(name) return "string[" .. name .. "]" end
  }
  return setmetatable({}, {
    __scope = scope;
    __call = function(_, t)
               return name .. "[" .. table.concat(t, ",") .. "]"
             end
  })
end

local obj = struct "foo" {
              int "bar";
              string "baz";
            }
assert(obj == "foo[int[bar],string[baz]]")
assert(string.lower("ABC") == "abc")
assert(int == nil)
print "DONE"

Download patch: [tablescope.patch] (for Lua 5.1.3)

The patch makes no change to Lua syntax nor Lua bytecodes. The only change is to support of a new metamethod named __scope. If a table construct is used as the last argument of a function call, and the object being called contains a __scope metamethod that is a table, then global variables mentioned inside the table construct are first looked up in __scope. Only if not found in __scope is a variable then looked up in the environment table as usual.

The patch makes some assumptions about the order of bytecodes to infer how tables nest global variable accesses. It's possible these assumptions would not be met if the byte codes were not compiled by luac (e.g. compiled by MetaLua). However, the effect of not meeting these assumptions in a certain function would generally be that table scoping is simply not applied to that function, though there may be very unusual cases having security implications.

There could be ways to reduce the performance impact on global accesses using this patch. Suggestions are welcome. For example, the table scoping lookup could be selectively enabled or disabled on a specific function. If performance is a concern, you should be using local variables anyway.

MetaLua generates the exact same bytecode as luac, unless you use Goto or Stat. Besides, if you were using MetaLua, you'd also use it to handle table scopes rather than patching Lua -- FabienFleutot

Solution: Global Environment Tricks

Avoiding patches, we can do some tricks with the Lua environment table. The following pattern might be used (original idea suggested by RiciLake):

-- shapes.lua
local M = {}

local Rectangle = {
  __tostring = function(self)
    return string.format("rectangle[%s,%s]",
      tostring(self[1]), tostring(self[2]))
  end
}
local Point = {
  __tostring = function(self)
    return string.format("point[%f,%f]",
      tostring(self[1]), tostring(self[2]))
  end
}

function M.point(x,y)
  return setmetatable({x,y}, Point)
end
function M.rectangle(t)
  local point1 = assert(t[1])
  local point2 = assert(t[2])
  return setmetatable({point1, point2}, Rectangle)
end

return M

-- shapes_test.lua

-- with: namespace, [level], [filter] --> (lambda: ... --> ...)
function with(namespace, level, filter)
  level = level or 1; level = level + 1

  -- Handle __with metamethod if defined.
  local mt = getmetatable(namespace)
  if type(mt) == "table" then
    local custom_with = mt.__with
    if custom_with then
      return custom_with(namespace, level, filter)
    end
  end

  local old_env = getfenv(level)  -- Save

  -- Create local environment.
  local env = {}
  setmetatable(env, {
    __index = function(env, k)
      local v = namespace[k]; if v == nil then v = old_env[k] end
      return v
    end
  })
  setfenv(level, env)

  return function(...)
    setfenv(2, old_env)       -- Restore
    if filter then return filter(...) end
    return ...
  end
end

local shapes = require "shapes"

local o = with(shapes) (
  rectangle { point(0,0), point(3,4) }
)
assert(not rectangle and not point) -- note: not visible here
print(o)
-- outputs: rectangle[point[0.000000,0.000000],point[3.000000,4.000000]]

The key is the with function, which provides local access to a given namespace. It is similar in purpose to the "with" clause in some other languages like VB and somewhat related to using namespace in C++ or import static[1] in Java. It might also be similar to XML namespaces.

The following special case correctly outputs the same result:

point = 2
function calc(x) return x * point end
local function calc2(x) return x/2 end
local o = with(shapes) ( rectangle { point(0,0), point(calc2(6),calc(2)) } )
print(o)
-- outputs: rectangle[point[0.000000,0.000000],point[3.000000,4.000000]]

The optional arguments to with can be useful when defining wrappers to simplify expressions:

function shape_context(level)
  return with(shapes, (level or 1)+1, function(x) return x[1] end)
end

local o = shape_context() {
  rectangle { point(0,0), point(3,4) }
}
print(o)
-- outputs: rectangle[point[0.000000,0.000000],point[3.000000,4.000000]]

Further simplification is possible by automatically invoking with when a global key is accessed:

setmetatable(_G, {
  __index = function(t, k)
    if k == "rectangle" then
       return with(shapes, 2, function(...) return shapes.rectangle(...) end)
    end
  end
})

local o = rectangle { point(0,0), point(3,4) }
print(o)
-- outputs: rectangle[point[0.000000,0.000000],point[3.000000,4.000000]]

One caveat is that approach relies on undocumented Lua behavior. The function name with must be resolved before the arguments to the function are resolved, which is the behavior in the current release of Lua 5.1.

Also, though the desire is for with to provide a type of lexical scoping, and it simulates it fairly well, the implementation is actually more dynamic. The following will cause a run-time error since real lexicals (locals) override globals:

local point = 123
local o = with(shapes) ( rectangle { point(0,0), point(3,4) } )

Further, we assume that the environment of the caller can be temporarily replaced with no ill-effects.

Leaving off the final call (accidentally) will not necessarily result in an error but will leave the environment changed:

local o = with(shapes)
assert(rectangle) -- opps! rectangle is visible now.

But that does suggest this usage is possible (though not necessarily a good idea):

local f = with(shapes) -- begin scope
local o1 = rectangle { point(0,0), point(3,4) }
local o1 = rectangle { point(0,0), point(5,6) }
f() -- end scope
assert(not rectangle) -- rectangle no longer visible

Another problem is if there are exceptions in evaluating the arguments, the environment will not be restored:

local last = nil
function test()
  last = rectangle
  local o = with(shapes) ( rectangle { point(3,4) } ) -- raises error
end
assert(not pcall(test))
assert(not last)
assert(not pcall(test))
assert(last)  -- opps! environment not restored

Unfortunately, there doesn't seem to be any unobtrusive way to wrap the arguments in a pcall. We can do it using a new pwith function that accepts the data wrapped in the awkward function() return ... end syntax for later evaluation by a pcall:

local o = pwith(shapes)(function() return
  rectangle { point(3,4) } -- raises error
end)

In fact, this approach can avoid the issues of reliance on undocumented behavior, danger of leaving off the second call, and the danger of touching the caller's environment as discussed above. We just need to live with the syntax (see also the "Global Collector" pattern above, which applies a similar syntax and semantics).

Another variation is to invoke the pwith after the fact (e.g. outside of the configuration file):

rect = function() return rectangle { point(3,4) } end
...
pwith(shapes)(rec)

or perhaps pwith can be triggered upon a __newindex metamethod event on _G.

The non-pcall form may be ok. Just note the contract for using it: if it raises an exception, throw away the function that called it. This may well be a good approach for a configuration language.

Note, however, that the above approaches do not work that well with some methods of DetectingUndefinedVariables. rectangle and point may be identified as undefined variables, particularly under static checks for undefined globals (these are global/environment variables not defined in the top-level script).

If we don't need particular access to upvalues, we can stringify the above data function (see "Stringified Anonymous Functions" pattern in ShortAnonymousFunctions details):

local o = pwith(shapes)[[
  rectangle { point(x,y) }
]]{x = 3, y = 4}

We loose direct access to lexicals in the caller, but pwith could prepend locals to the data string so that rectangle and point (as well as x and y become lexicals. pwith could implement it like this provided the maximum lexical limit is not reached:

local code = [[
  local rectangle, point, x, y = ...
]] .. datasttring
local f = loadstring(code)(namespace.rectangle, namespace.point, x, y)

Solution: using Metalua

Here's an implementation in Metalua:

-- with.lua

function with_expr_builder(t)
  local namespace, value = t[1], t[2]
  local tmp = mlp.gensym()
  local code = +{block:
    local namespace = -{namespace}
    local old_env = getfenv(1)
    local env = setmetatable({}, {
      __index = function(t,k)
        local v = namespace[k]; if v == nil then v = old_env[k] end
        return v
      end
    })
    local -{tmp}
    local f = setfenv((|| -{value}), env)
    local function helper(success, ...)
      return {n=select('#',...), success=success, ...}
    end
    let -{tmp} = helper(pcall(f))
    if not -{tmp}.success then error(-{tmp}[1]) end
  }
  -- NOTE: Stat seems to only support returning a single value.
  --       Multiple return values are ignored (even though attempted here)
  return `Stat{code, +{unpack(-{tmp}, 1, -{tmp}.n)}}
end

function with_stat_builder(t)
  local namespace, block = t[1], t[2]
  local tmp = mlp.gensym()
  local code = +{block:
    local namespace = -{namespace}
    local old_env = getfenv(1)
    local env = setmetatable({}, {
      __index = function(t,k)
        local v = namespace[k]; if v == nil then v = old_env[k] end
        return v
      end
    })
    local -{tmp}
    local f = setfenv(function() -{block} end, env)
    local success, msg = pcall(f)
    if not success then error(msg) end
  }
  return code
end

mlp.lexer.register { "with", "|" }

mlp.expr.primary.add {
  "with", "|", mlp.expr, "|", mlp.expr,
  builder=with_expr_builder
}

mlp.stat.add {
  "with", mlp.expr, "do", mlp.block, "end",
  builder=with_stat_builder
}

Example usage:

-{ dofile "with.luac" }

local shapes = require "shapes"
rectangle = 123  -- no problem

local o = with |shapes|
          rectangle { point(0,0), point(3,4) }
print(o)
--outputs: rectangle[point[0.000000,0.000000],point[3.000000,4.000000]]

local o
with shapes do
  o = rectangle { point(0,0), point(3,4) }
end
print(o)
--outputs: rectangle[point[0.000000,0.000000],point[3.000000,4.000000]]

local other = {double = function(x) return 2*x end}

local o = with |other|
          with |shapes|
          rectangle { point(0,0), point(3,double(4)) }
print(o)
--outputs: rectangle[point[0.000000,0.000000],point[3.000000,8.000000]]

local o
local success, msg = pcall(function()
  o = with |shapes| rectangle { point(0,0) }
end)
assert(not success)
assert(rectangle == 123) -- original environment

Update (2010-06): The setfenv above can be avoided, as would be necessarily in Lua 5.2.0-work3 (see below).

Lua 5.2

Lua-5.2.0-work3 removes setfenv (though preserves partial support for it in the debug library). This invalidates many of the "Environment Trick" techniques above, although they weren't that good anyway.

In Lua-5.2.0-work3, _ENV would permit

local o = (function(_ENV) return rectangle { point(0,0), point(3,4) } end)(shapes)

That has similar qualities to the above 5.1 solution "pwith(shapes)(function() return ..... end) but doesn't need a pcall. The extra function above is only used to introduce a lexical (_ENV) in the middle of an expression, but we could remove that function if we define _ENV in a separate statement:

local o; do local _ENV = shapes
  o = rectangle { point(0,0), point(3,4) }
end

That could be cleaner syntactically, but it's not that bad, unless we need to switch environments a lot (e.g. evaluate the arguments of shapes, rectangle, and point each in their own environments).


--DavidManura, 2007/2008

See Also


RecentChanges · preferences
edit · history
Last edited June 15, 2010 3:11 am GMT (diff)