lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, 13 Aug 2009 00:59:40 -0400, David Manura wrote:
> Let some module A in a program do require "strict" and some other
> module B in the program do module(..., package.seeall).  Now, strict
> behavior is applied to the implementation of B.  This may be desirable
> or it might not be.

In my opinion, this is *not* desirable.  If A is written using
"strict", then "strict" should apply only to A, and not to modules
that use A.

> I've long felt there are significant flaws in how the "module"
> function works [1] because it encourages certain practices and has
> subtle behaviors that make writing reliable and secure programs more
> difficult.

I agree.  In the short time I've been using Lua, I have missed the
kind of module systems I have used with some other languages, in
particular those that enforce isolation between modules and control
over their imports and exports.

As I only use Lua in my spare time (outside of work), I was able to
let a number of ideas percolate in my brain for a while before I had
time to do anything about it.  Then, during a June vacation from work,
I did an implementation (on a Linux laptop, by the side of a lake in
the mountains, which is a wonderful way to code).

If there is interest, I could find a place to post my implementation.
(On Luaforge?)  The rest of this email is a description of what I
designed and implemented.  I borrowed heavily from ML and Scheme48 and
other places, while trying hard to keep the result in the spirit of
Lua.  Every time you see the word "structure", you should think
"module" or "package".  Oh, and I'm using a rather narrow definition
of "binding" in my explanations [2].

My system achieves these goals:

(1) Backwards compatibility with all of the existing module features,
which continue to work as they do today.  You can run your code in an
environment that has a mix of regular Lua modules and "structures",
which are what my system defines.

(2) The ability to declare "structures", which are proper modules that
import the bindings they need, and export the bindings they want to
expose.  The name "structure" is from ML and Scheme48.   It was a
convenient choice because both "module" and "package" are already used
in Lua.  The declaration of a structure is typically separate from the
code that implements it, making it easy to reuse existing code without
modifying any files.

(3) When you open a structure S, its exported bindings become visible
to you.  You get a copy of the bindings, which means you cannot affect
other code that uses S.  The copy is necessary because Lua does not
support read-only tables (without using metatables).

(4) Structure declarations are short, and can refer to (re-use)
existing code (both Lua and object libraries).  Most of the time, the
existing file of code does not have to be modified at all, whether it
uses Lua's "module" function or not.

(5) You can easily create a "user" package that is a tightly
controlled environment for running user code.  My primary goals were
enforced namespace control (secure == no leaks!) and isolation (robust
module code == no unwanted interactions!).

My system, called Darwin, supplies a function structure.open(modname)
which opens a structure (a module).  A configuration option causes
Darwin to insert entries in package.preload of the form
[modname]=structure.open, which effectively extends the existing Lua
"require" function to first look for a defined structure to open
before trying other loaders.

I have been using Darwin for all of my (spare time) Lua coding since
June, and so far it is doing everything I want.  I've tested various
bits of Lua code available online, including pure C libraries, pure
Lua libraries, and hybrids (e.g. "lanes").  I'm sure there are bugs
and perhaps unintuitive behavior lurking still in Darwin, though the
rate at which I am finding them has dropped off.  Perhaps by sharing
Darwin, others can help me find more flaws.

The implementation is around 950 lines of Lua code, about 200 of which
are re-implementations of most of loadlib.c.  (The loadlib functions
are effectively closed over a single package table, and Darwin
provides a package table for each structure that needs one.)  The
run-time space is also larger than it could be, adding between 4-8k
(mostly copies of bindings) for each structure you load, depending on
if that structure uses _G (around 3k) or _G plus all the standard
libraries (around 8k).  If there were a clean way to support read-only
tables, then the overhead incurred by copying tables of bindings would
go away.

Note: I do not like the idea of using a metatable to create a
read-only table in this particular situation, because you need to
protect the metatable in order to prevent user code from escaping its
"jail" (which is the structure in which it executes).  But once you
protect the metatable, you break a useful (and widely used) feature of
Lua, which is the ability to change the metatable of the environment.
That is why I chose a copying approach.  Darwin works fine with
"strict" -- the effects of "strict" are limited to only the structures
that use it.

Here's a quick example using Lua Lanes 2.0.3, which I installed
unmodified.  After playing with it, I wrote this declaration [3]:

> structure.declare { name="lanes";
                                 open={"_G", "package", "table", "string"};
                                 environment=[[ require("lanes");
return lanes ]];
                               }
> require "lanes"
> dofile "lanes-test.lua"
done	pending	
waiting...	
5083	true	
6500	true	
. running	
....... cancelled	
Lane starts!
1 received
1 sent
2 received
2 sent
......              [Remainder of test output deleted]

The declaration of the structure "lanes" uses Lua's require function
(reimplemented in Darwin), which uses the loaders in the package
table.  By declaring the structure, an entry for "lanes" was put into
package.preload, so that I could then type 'require "lanes"' to open
the lanes structure.

As you can guess from the example declaration, the "environment" field
of a declaration contains code that returns the table that will become
the environment of the module.  There are other fields that can be
used in a declaration as well, e.g. to load in files of code and to
direct where the bindings should appear when the structure is opened
(so that you can put them anywhere you want, even at top level).

I think the design is the right one to enable separate compilation of
modules in such a way that compiled module code retains its run-time
behavior when mixed with other code -- it is isolated.  But I have not
had time to look into this yet.  Also, there may be an opportunity to
exploit weak populations to reduce run-time space, i.e. to use weak
tables for imported bindings so that, e.g. if you only use two
functions in _G, your copies of the other bindings will be collected.
Of course, this can break reflective code.

I conjecture that a couple of small changes to the implementation of
Lua (not to the language) would allow me to shrink the Darwin code to
about 500 lines, and also to reduce the run-time space used by the
private tables of bindings in each structure.

Putting aside any changes to Lua, I'd like to know if there's interest
in Darwin.  If so, then I'll write up some docs and put those and the
code online.   Would anyone find this useful?

--Jim

> [1] http://lua-users.org/wiki/LuaModuleFunctionCritiqued

[2]   A binding is a mapping from a name to a location in the store
(which is an abstraction of memory).  The binding of "dofile" in _G
maps the name "dofile" to a location that happens to contain a
function.  An assignment to "dofile" (e.g. dofile=4) causes a change
in its binding -- the name will map to a different location.   The
function to which "dofile" was bound still exists, and there may well
be other bindings to the location containing that function.  When
there are no bindings (or only weak bindings) to that location, the
function that implements the dofile operation can be garbage
collected.  (This definition technically only applies to boxed values,
but it generalizes to "location or unboxed value".  And, my point was
to illustrate that a Darwin structure does not have its own copies of
all of the functions from structures it opens (e.g. _G, string, os,
lanes, ...).  A Darwin structure has copies of *bindings* to those
functions.)

[3] In the example above, I typed in the declaration for "lanes".  In
normal practice, I have some files containing such declarations, and I
load the relevant ones into my interactive Lua state.  Since
structures are not loaded until they are opened, I can have lots of
declared structures and they take up almost no space until I start
using them (e.g. with "require(modname)").