Lua Module Function Critiqued |
|
module function[1]
has design flaws that encourage poor practices in module design,
potentially leading to code bugs and ambiguities through side-effects in global
variables, and this function should be avoided. It is the hope that
this article will further deter the use of the module function
and that this function would be either removed or improved upon in a
future version of Lua.
Before detailing the perils of the module function, we'll note that the choice of whether or not to use the module function is more than just a personal choice, but it affects other authors. It is quite easy for a Lua module author to avoid writing module calls. Indeed, this function is never required for defining modules, as it is just a simple helper function that wraps common behaviors that themselves are
required by neither Lua nor the other much more useful parts of the Lua 5.1
module system such as require. However, since modules often use other modules written by other authors who themselves might have used the module function, and the module function causes global side-effects, its effects are not entirely avoidable by choice and without modifying the implementation of those other modules. In practice, the use of the module function is somewhat common, likely because the module function is included in the Lua standard libraries,
presumably as a convenience and standardized best practice for module
definition, and a number of official or reputable Lua sources, such as
the Lua Reference Manual[2] and Programming in Lua (PiL)[3] encourage the
use of the module function and even suggest it is a good one.
Therefore, new users quickly become accustomed to using the module
function.
The usual way to define a module with the module function is like
this:
-- hello/world.lua module(..., package.seeall) local function test(n) print(n) end function test1() test(123) end function test2() test1(); test1() end
and it is used like this:
require "hello.world" require "anothermodule" hello.world.test2()
There are two main complaints presented on the module function, which are both seen if anothermodule is defined like this:
-- anothermodule.lua module(..., package.seeall) assert(hello.world.hello.world.print == _G.print) -- weird assert(hello ~= nil) -- where'd this come from anyway?
First, the global namespace is accessible by indexing the module table; second, hello is visible in this module even though it was not requested by it.
The first complaint it is less inherent
to the module function but rather due only to the
package.seeall option. package.seeall allows a module to see
global variables, which are normally hidden since the module
function replaces the current environment of the module with a local
one. What package.seeall does is muck with the metatable of the
module's environment to fallback to _G. This allows not only the
module itself it access _G, but the variables in _G also become part
of the module's interface. Among various things, the behavior of
exposing the global environment through the module table could be
detrimental to sandboxing (see SandBoxes), and these variables might be used
accidentally, but more glaringly it's just plain weird.
Luckily, package.seeall is only a convenience option and can
be avoided as such:
-- hello/world.lua local _G = _G module(...) function test() _G.print(123) end
or
-- hello/world.lua local print = print module(...) function test() print(123) end
Those are a bit awkward, but there may be other more syntactically pleasing ways to avoid it, such as by recognizing that the module table and the module environment table need not be the same (e.g. see LuaDesignPatterns -- "Module System with Public/Private Namespaces"). We won't go into further detail on this first point.
The second points is that the module function has the side
effect of creating global variables named in ways the programmer
doesn't fully control. On executing module("hello.world"), the
function creates a table named "hello" in the global environment (the
initial global environment, not the current environment set through
setfenv), and stores the module table under the key "world" in that
table. However, if any of those variables already exist (e.g. someone
else placed them there), the function raises and error, which at least
provides some level of safety. The behavior of the module function
can best be understood with the following representation of it in Lua
taken from LuaCompat[4] (the real
version is in loadlib.c).
local _LOADED = package.loaded function _G.module (modname, ...) local ns = _LOADED[modname] if type(ns) ~= "table" then ns = findtable (_G, modname) if not ns then error (string.format ("name conflict for module '%s'", modname)) end _LOADED[modname] = ns end if not ns._NAME then ns._NAME = modname ns._M = ns ns._PACKAGE = gsub (modname, "[^.]*$", "") end setfenv (2, ns) for i, f in ipairs (arg) do f (ns) end end
The problem results since we have different modules maintained by different people writing to the global environment. Furthermore, an application using those modules may be writing to the global environment as well. Due to information hiding,[5] the modules and the application should have no knowledge of the internal workings / implementation of those modules--nor, possibly, even the names of the modules those modules require. The result is that a program lacks control over which global variables get set. Various types of this problem that result from this are illustrated below.
In the following examples, we will as a convenience define modules inline rather than in separate files. For example, rather than creating two files like such
-- mymodule.lua module(...) function test() return 1+2 end -- mymodule_test.lua require "mymodule" print(mymodule.test())
we will simply write
(function() module("mymodule") function test() return 1+2 end end)(); print(mymodule.test())
Here is the first example:
(function() local require = require local print = print local module = module module("yourmodule"); (function() module("mymodule") end)() print(mymodule ~= nil) -- prints false end)(); print(mymodule ~= nil) -- prints true
As shown, loading such module always populates the global environment rather than the current environment where the module is used. This is the reverse of what is needed.
(function() local _G = _G module("mymodule"); (function() _G.module("one") end)(); (function() _G.module("two") end)(); (function() _G.module("three") end)() function hello() _G.print("hello") end end)(); mymodule.hello() assert(one and two and three) -- junk
As shown above, if a program loads a module that loads other modules, extra junk is placed in the global namespace.
Another problem is as Mark Hamburg notes[9], putting modules into the global namespace hides dependencies. If your program does require "baz" and baz just happens to load foo, then your program could inadvertently depend on baz, and your program will break if foo later removes its dependency.
The following two examples are related to each other:
function test() return 1+2 end (function() module("mymodule", package.seeall); (function() module("test.more") -- fails: name conflict for module 'test.more' function hello() return 1+2 end end)() end)()
and
(function() module("test") function check() return true end end)(); (function() module("test.check") -- fails: name conflict for module 'test.check' function hello() return 1+2 end end)();
As seen, package names and regular variable names conflict.
The module function does detect and raise an error if a global
variable it's overwriting already exists. That's what we want, right?
Well, this also means that it's particularly indeterminant whether
loading a module will succeed since the module may load other modules
whose names (and names of its members) we might not know and that
conflict with global variables.
As a side note, in some other languages (e.g. Perl), variables and package names are maintained in separate namespaces and so are prevented from conflicting.[*3] It's also noteworthy, that the module naming conventions affect if and how names conflict. For example, Java package names[6] are conventionally prefixed by a (unique) domain name under the author's control, which is verbose but provides a mechanism to avoid conflict. In Perl, CPAN provides a central naming registry to prevent conflicts, and modules with the same prefix indicate a common function rather than a common maintainer (e.g. "CGI"[7] and "CGI::Minimal"[8] are maintained independently by different authors, and "CGI::Minimial" is not stored inside the "CGI" table).
(function() module("mymodule", package.seeall); (function() module("test.more") function hello() return 1+2 end end)() function greet() test.more.hello() -- fails -- attempt to index global 'test' (a function value) end end)(); function test() mymodule.greet() end test()
Here, the program inadvertently overwrites a global variable set by the module function. The module function does not detect this. Rather, there is program failure (possibly a silent one) when a module that depends on this global variable attempts to access this variable.
(function() local require = require local module = module local print = print local _P = package.loaded module('yourmodule.two'); (function() module('mymodule.one') end)() print(_P['mymodule.one'] ~= nil) -- prints true end)(); local _P = package.loaded print(_P['mymodule.one'] ~= nil) -- prints true
Storing modules in the global environment is in fact somewhat redundant
since they are also stored in package.loaded (though without
creating nested tables for the periods in the module name).
~~~
The problems above can be avoided by not using the module function
but instead defining modules in the following simple way:[*1][*2]
-- hello/world.lua local M = {} local function test(n) print(n) end function M.test1() test(123) end function M.test2() M.test1(); M.test1() end return M
and importing modules this way:
local MT = require "hello.world" MT.test2()
Note that the public functions are clearly indicated with the M.
prefix. Unlike when using module, the global environment is not
visible though the MT table (i.e. MT.print == nil), the
hello.world table has not been exported (or polluted) to the
global environment but is rather a lexical, and modules with the same prefix
(e.g. hello.world.again) would not alter the hello.world table.
In the client code, the module hello.world can be given a short
abbreviation local to that module (e.g. MT). The approach
also works well with DetectingUndefinedVariables. This is great. The
one complaint is that public functions need to be prefixed with M.
in the module itself, but then the other solutions are often proposed
introducing their own problems and complexities, such as
package.seeall noted above. It does not
particularly hurt to be explicit with M. (two characters),
especially when code size gets larger.
The module function may introduce more problems than it solves.
[*1] (Advocates of the above style include RiciLake, DavidManura, others who have mentioned it on IRC, MikePall [10][11], ... (add your name here))
[*2] There has also been the suggestion to move the standard libraries in this direction [12].
[*3] Example in Perl where modules and variables of the same name do not conflict:
package One; our $Two = 2; package One::Two; our $Three = 3; package main; print "$One::Two,$One::Two::Three" # prints 2,3
package.loaded[...]={} module(...) -- you might want to add package.seeall
However this is not a solution to solve the global namespace being accessed through the module. For that we would need a modified module function. Hopefully in the next Lua release.
-- mod.lua local _E = setmetatable({}, {__index=_G}) local _M = {} package.loaded[...] = _M module(...) _E.setfenv(1, _E) function _M.test() return math.sqrt(9) end test2 = 1 --modtest.lua local m = require "mod" assert(not mod) assert(m.test() == 3) assert(not test) assert(not test2) assert(not m.print) print 'done' $ luac -p -l mod.lua | lua /usr/local/lua-5.1.3/test/globals.lua setmetatable 1 _G 1 package 3 module 4 test2 9* math 7
luaL_newmetatable/luaL_getmetatable/luaL_checkudata and luaL_register in ProgrammingInLuaComments (for similar reasons)