• Subject: Re: On implementing a functions whitelist for a sandbox
• From: Sean Conner <sean@...>
• Date: Wed, 7 Aug 2019 13:24:23 -0400

```It was thus said that the Great Kynn Jones once stated:
> On Wed, Aug 7, 2019 at 12:22 AM Sean Conner <sean@conman.org> wrote:
>
> > It was thus said that the Great Kynn Jones once stated:
> > > Regarding your question, it seems to me that the whitelist should
> > > include (a) functions that the loaded code invokes, directly or
> > > indirectly; (b) functions that get called by the interpreter in the
> > > process of running the loaded code (e.g. functions that intercept
> > > errors in the loaded code).
> >
>
> Let me elaborate on this a bit.  What I meant to say is that the
> functions included in the whitelist can be classified into two groups.
>
> The first group consists of functions that we have included in the
> whitelist because we consider them integral to what we have decided to
> allow the loaded code to do (for example perform mathematical
> operations); these are functions that the loaded code invokes directly
> or indirectly.
>
> The second group of functions in the whitelist are other functions,
> which are *not* invoked directly or indirectly by the loaded code, but
> that get called nonetheless by the Lua interpreter in the process of
> running the loaded code.  These include functions that get invoked
> when the loaded code has a runtime error.

First thing to understand are scopes---there are three concepts here,
global, local and upvalues.

a = 1
local b = 2

local function f(c)
local d = a + b + c
return d
end

Here, a is a global, b is local.  By declaring function f() (which itself is
local), we are creating a new scope for the locals c (even though it's a
parameter, it's still a local) and d.  From the point of view of d, a is
still global, c is a local, and b is an upvalue.  An upvalue is a local
variable from an outer scope.

> For example, if functions A and B are defined as
>
>     function A (x) return B(x) end
>
>     function B (x) return math.random(x) end

I'm going to change things up a bit here.

local B(x) return math.random(x) end
local A(x) return B(x) end

local test = [[print(A(1/0))]]

test will contain the code we're going to load and execute, and to do so, it
needs a reference to print() and A().  This we can do:

local env  = { print = print , A = A }
f()

stdin:2: bad argument #1 to 'random' (number has no integer representation)
stack traceback:
[C]: in function 'math.random'
stdin:2: in function <stdin:2>
(...tail calls...)
stdin:7: in main chunk
[C]: in ?

This is as expected.  Notice how we have given a limited "global"
environment to the loaded code that consists of print() and A().  print() is
a Lua function written in C, and it has access to everything it needs.  With
respect to A(), B() is an upvalue, and A() has a reference to it, so no need
to include B() in the limited "global" environment.

Now B().  In B(), math is a global and now I have to go into a slight
digression about how globals work in Lua.  Each function has an implicit
upvalue to a table of global variables.  It is defined by the Lua system on
behalf of functions.  It has a name (_ENV) but it is not necessarily the
first upvalue, nor is it the first upvalue that's a table.  So B() has
Upvalues can be shared among functions:

local function C(x) return math.sin(x * 2) end
local function D(x) return math.cos(x / 2) end

Here, both C() and D() share the same _ENV upvalue (normally, all
functions share the same _ENV upvalue, unless you overwrite it).  So A() has
a reference to B(), and B() has a reference to math, and everything works
out fine.  If, instead of giving the chunk a global environment with print()
and A(), you give it an empty global environment, you'll see:

(load):1: attempt to call a nil value (global 'A')
stack traceback:
stdin:6: in main chunk
[C]: in ?

If A() and B() were global variables, then you would need to include both
A() and B() in the limited "global" environment for the code to work.  And
for the following chunk of code:

test = [[
local function B(x) return math.random(x) end
local function A(x) return B(x) end

print(A(1/0))
]]

You'll need print() and math in the limited "global" environment for this
to work:

local env = { print = print , math = math }
f()

stack traceback:
[C]: in function 'math.random'
(...tail calls...)
stdin:10: in main chunk
[C]: in ?

> ...and the loaded code is the string 'print(A(1/0))', then `print`,
> `A`, and integer division are being *directly* invoked by the loaded
> code, while `B` and `math.random` are being *indirectly* invoked by
>
> On the other hand, the "unnamed code X" responsible for catching the
> ensuing error
>
>     bad argument #1 to 'random' (number has no integer representation)
>
> ...and printing the useful error message for it is not being invoked,

Which happens.  Try running the code presented.

-spc

```

• Follow-Ups:
• References: