Lua Module Function Critiqued

lua-users home
wiki

The argument presented here is that the Lua 5.1 module function[1] has design flaws that encourage poor practices in module design, potentially leading to code bugs and ambiguities through side-effects in global variables, and this function should be avoided. It is the hope that this article will further deter the use of the module function and that this function would be either removed or improved upon in a future version of Lua.

Before detailing the perils of the module function, we'll note that the choice of whether or not to use the module function is more than just a personal choice, but it affects other authors. It is quite easy for a Lua module author to avoid writing module calls. Indeed, this function is never required for defining modules, as it is just a simple helper function that wraps common behaviors that themselves are required by neither Lua nor the other much more useful parts of the Lua 5.1 module system such as require. However, since modules often use other modules written by other authors who themselves might have used the module function, and the module function causes global side-effects, its effects are not entirely avoidable by choice and without modifying the implementation of those other modules. In practice, the use of the module function is somewhat common, likely because the module function is included in the Lua standard libraries, presumably as a convenience and standardized best practice for module definition, and a number of official or reputable Lua sources, such as the Lua Reference Manual[2] and Programming in Lua (PiL)[3] encourage the use of the module function and even suggest it is a good one. Therefore, new users quickly become accustomed to using the module function.

The usual way to define a module with the module function is like this:

-- hello/world.lua
module(..., package.seeall)
local function test(n) print(n) end
function test1() test(123) end
function test2() test1(); test1() end

and it is used like this:

require "hello.world"
require "anothermodule"
hello.world.test2()

There are two main complaints presented on the module function, which are both seen if anothermodule is defined like this:

-- anothermodule.lua
module(..., package.seeall)
assert(hello.world.hello.world.print == _G.print)  -- weird
assert(hello ~= nil) -- where'd this come from anyway?

First, the global namespace is accessible by indexing the module table; second, hello is visible in this module even though it was not requested by it.

The first complaint it is less inherent to the module function but rather due only to the package.seeall option. package.seeall allows a module to see global variables, which are normally hidden since the module function replaces the current environment of the module with a local one. What package.seeall does is muck with the metatable of the module's environment to fallback to _G. This allows not only the module itself it access _G, but the variables in _G also become part of the module's interface. Among various things, the behavior of exposing the global environment through the module table could be detrimental to sandboxing (see SandBoxes), and these variables might be used accidentally, but more glaringly it's just plain weird.

Luckily, package.seeall is only a convenience option and can be avoided as such:

-- hello/world.lua
local _G = _G
module(...)
function test() _G.print(123) end

or

-- hello/world.lua
local print = print
module(...)
function test() print(123) end

Those are a bit awkward, but there may be other more syntactically pleasing ways to avoid it, such as by recognizing that the module table and the module environment table need not be the same (e.g. see LuaDesignPatterns -- "Module System with Public/Private Namespaces"). We won't go into further detail on this first point.

The second points is that the module function has the side effect of creating global variables named in ways the programmer doesn't fully control. On executing module("hello.world"), the function creates a table named "hello" in the global environment (the initial global environment, not the current environment set through setfenv), and stores the module table under the key "world" in that table. However, if any of those variables already exist (e.g. someone else placed them there), the function raises and error, which at least provides some level of safety. The behavior of the module function can best be understood with the following representation of it in Lua taken from LuaCompat[4] (the real version is in loadlib.c).

local _LOADED = package.loaded
function _G.module (modname, ...)
  local ns = _LOADED[modname]
  if type(ns) ~= "table" then
    ns = findtable (_G, modname)
    if not ns then
      error (string.format ("name conflict for module '%s'", modname))
    end
    _LOADED[modname] = ns
  end
  if not ns._NAME then
    ns._NAME = modname
    ns._M = ns
    ns._PACKAGE = gsub (modname, "[^.]*$", "")
  end
  setfenv (2, ns)
  for i, f in ipairs (arg) do
    f (ns)
  end
end

The problem results since we have different modules maintained by different people writing to the global environment. Furthermore, an application using those modules may be writing to the global environment as well. Due to information hiding,[5] the modules and the application should have no knowledge of the internal workings / implementation of those modules--nor, possibly, even the names of the modules those modules require. The result is that a program lacks control over which global variables get set. Various types of this problem that result from this are illustrated below.

In the following examples, we will as a convenience define modules inline rather than in separate files. For example, rather than creating two files like such

-- mymodule.lua
module(...)
function test() return 1+2 end

-- mymodule_test.lua
require "mymodule"
print(mymodule.test())

we will simply write

(function()
  module("mymodule")
  function test() return 1+2 end
end)();
print(mymodule.test())

Here is the first example:

(function()
  local require = require
  local print = print
  local module = module
  module("yourmodule");

  (function()
    module("mymodule")
  end)()

  print(mymodule ~= nil) -- prints false
end)();

print(mymodule ~= nil) -- prints true

As shown, loading such module always populates the global environment rather than the current environment where the module is used. This is the reverse of what is needed.

(function()
  local _G = _G
  module("mymodule");

  (function() _G.module("one") end)();
  (function() _G.module("two") end)();
  (function() _G.module("three") end)()

  function hello() _G.print("hello") end
end)();

mymodule.hello()
assert(one and two and three) -- junk

As shown above, if a program loads a module that loads other modules, extra junk is placed in the global namespace.

Another problem is as Mark Hamburg notes[9], putting modules into the global namespace hides dependencies. If your program does require "baz" and baz just happens to load foo, then your program could inadvertently depend on baz, and your program will break if foo later removes its dependency.

The following two examples are related to each other:

function test() return 1+2 end

(function()
  module("mymodule", package.seeall);

  (function()
    module("test.more") -- fails: name conflict for module 'test.more'
    function hello() return 1+2 end
  end)()
end)()

and

(function()
  module("test")
  function check() return true end
end)();

(function()
  module("test.check") -- fails: name conflict for module 'test.check'
  function hello() return 1+2 end
end)();

As seen, package names and regular variable names conflict. The module function does detect and raise an error if a global variable it's overwriting already exists. That's what we want, right? Well, this also means that it's particularly indeterminant whether loading a module will succeed since the module may load other modules whose names (and names of its members) we might not know and that conflict with global variables.

As a side note, in some other languages (e.g. Perl), variables and package names are maintained in separate namespaces and so are prevented from conflicting.[*3] It's also noteworthy, that the module naming conventions affect if and how names conflict. For example, Java package names[6] are conventionally prefixed by a (unique) domain name under the author's control, which is verbose but provides a mechanism to avoid conflict. In Perl, CPAN provides a central naming registry to prevent conflicts, and modules with the same prefix indicate a common function rather than a common maintainer (e.g. "CGI"[7] and "CGI::Minimal"[8] are maintained independently by different authors, and "CGI::Minimial" is not stored inside the "CGI" table).

(function()
  module("mymodule", package.seeall);

  (function()
    module("test.more")
    function hello() return 1+2 end
  end)()

  function greet()
    test.more.hello()  -- fails -- attempt to index global 'test' (a function value)
  end
end)();

function test()
  mymodule.greet()
end

test()

Here, the program inadvertently overwrites a global variable set by the module function. The module function does not detect this. Rather, there is program failure (possibly a silent one) when a module that depends on this global variable attempts to access this variable.

(function()
  local require = require
  local module = module
  local print = print
  local _P = package.loaded
  module('yourmodule.two');

  (function()
    module('mymodule.one')
  end)()

  print(_P['mymodule.one'] ~= nil) -- prints true
end)();

local _P = package.loaded
print(_P['mymodule.one'] ~= nil) -- prints true

Storing modules in the global environment is in fact somewhat redundant since they are also stored in package.loaded (though without creating nested tables for the periods in the module name).

~~~

The problems above can be avoided by not using the module function but instead defining modules in the following simple way:[*1][*2]

-- hello/world.lua
local M = {}

local function test(n) print(n) end
function M.test1() test(123) end
function M.test2() M.test1(); M.test1() end

return M

and importing modules this way:

local MT = require "hello.world"
MT.test2()

Note that the public functions are clearly indicated with the M. prefix. Unlike when using module, the global environment is not visible though the MT table (i.e. MT.print == nil), the hello.world table has not been exported (or polluted) to the global environment but is rather a lexical, and modules with the same prefix (e.g. hello.world.again) would not alter the hello.world table. In the client code, the module hello.world can be given a short abbreviation local to that module (e.g. MT). The approach also works well with DetectingUndefinedVariables. This is great. The one complaint is that public functions need to be prefixed with M. in the module itself, but then the other solutions are often proposed introducing their own problems and complexities, such as package.seeall noted above. It does not particularly hurt to be explicit with M. (two characters), especially when code size gets larger.

The module function may introduce more problems than it solves.

--DavidManura

Comments/Footnotes

[*1] (Advocates of the above style include RiciLake, DavidManura, others who have mentioned it on IRC, MikePall [10][11], ... (add your name here))

[*2] There has also been the suggestion to move the standard libraries in this direction [12].

[*3] Example in Perl where modules and variables of the same name do not conflict:

package One;
our $Two = 2;
package One::Two;
our $Three = 3;
package main;
print "$One::Two,$One::Two::Three" # prints 2,3


Not using the module function means that by omitting the local keyword, it could be very easy to pollute the global environment (which is bad, that's the purpose of that article). So we can improve the module function by changing the environment to something private (that can inherit from _G) and define in it the _M table (as now) that will contains the module public interface. I also was concerned about these issues, and there is a tricky way to use the module function and not clutter the global environment. There it is:

package.loaded[...]={}
module(...) -- you might want to add package.seeall

However this is not a solution to solve the global namespace being accessed through the module. For that we would need a modified module function. Hopefully in the next Lua release.

--MildredKiLya

"Not using the module function means that by omitting the local keyword, it could be very easy to pollute the global environment (which is bad,...)" -- True, but unwanted global accesses are detectable prior to run-time using a method in DetectingUndefinedVariables, and I consider them errors that should be fixed.

Your approach using the module function might be done as follows, though this is going out of its way to circumvent the current behavior of the module function:

-- mod.lua
local _E = setmetatable({}, {__index=_G})
local _M = {}
package.loaded[...] = _M
module(...)
_E.setfenv(1, _E)
function _M.test()
  return math.sqrt(9)
end
test2 = 1

--modtest.lua
local m = require "mod"
assert(not mod)
assert(m.test() == 3)
assert(not test)
assert(not test2)
assert(not m.print)
print 'done'

$ luac -p -l mod.lua | lua /usr/local/lua-5.1.3/test/globals.lua
setmetatable    1
_G      1
package 3
module  4
test2   9*
math    7
--DavidManura

See Also


FindPage · RecentChanges · preferences
edit · history
Last edited April 9, 2008 2:58 am GMT (diff)