lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Fri, Jul 2, 2010 at 10:22 AM, Mark Hamburg <mark@grubmah.com> wrote:
> The Lightroom approach is to simply run a "linter" over the byte code to look for access
> to local variables not on our white list.  What I haven't dug into yet is how easy this will
> be to adapt to the 5.2 work 3 model for environments. Presumably we need to look for
> field accesses on particular parameters or upvalues. What could be particularly interesting
> relative to 5.1 would be if it were easy to tell the difference between accesses to the
> chunk-level _ENV and accesses to a more local _ENV parameter.

I think the bytecode approach can still be used, perhaps more-so, in 5.2work3:

=====
local sqrt = math.sqrt
local x = 1
function f(_ENV)
  x = y1
  y1 = sqrt(x)
end
function f()
  x = y2;
  y2 = sqrt(x)
end
local _ENV = {}
function f()
  x = y3
  y3 = sqrt(x)
end

$ luac -p -l test.lua
main <test.lua:0,0> (11 instructions, 44 bytes at 0x6b8fa0)
0+ params, 4 slots, 1 upvalue, 3 locals, 4 constants, 3 functions
        1       [1]     GETTABUP        0 0 -1  ; _ENV "math"
        2       [1]     GETTABLE        0 0 -2  ; "sqrt"
        3       [2]     LOADK           1 -3    ; 1
        4       [6]     CLOSURE         2 0     ; 0x6b9128
        5       [3]     SETTABUP        0 -4 2  ; _ENV "f"
        6       [10]    CLOSURE         2 1     ; 0x6b93e8
        7       [7]     SETTABUP        0 -4 2  ; _ENV "f"
        8       [11]    NEWTABLE        2 0 0
        9       [15]    CLOSURE         3 2     ; 0x6b95f0
        10      [12]    SETTABLE        2 -4 3  ; "f" -
        11      [15]    RETURN          0 1

function <test.lua:3,6> (7 instructions, 28 bytes at 0x6b9128)
1 param, 3 slots, 2 upvalues, 1 local, 1 constant, 0 functions
        1       [4]     GETTABLE        1 0 -1  ; "y1"
        2       [4]     SETUPVAL        1 0     ; x
        3       [5]     GETUPVAL        1 1     ; sqrt
        4       [5]     GETUPVAL        2 0     ; x
        5       [5]     CALL            1 2 2
        6       [5]     SETTABLE        0 -1 1  ; "y1" -
        7       [6]     RETURN          0 1

function <test.lua:7,10> (7 instructions, 28 bytes at 0x6b93e8)
0 params, 2 slots, 3 upvalues, 0 locals, 1 constant, 0 functions
        1       [8]     GETTABUP        0 1 -1  ; _ENV "y2"
        2       [8]     SETUPVAL        0 0     ; x
        3       [9]     GETUPVAL        0 2     ; sqrt
        4       [9]     GETUPVAL        1 0     ; x
        5       [9]     CALL            0 2 2
        6       [9]     SETTABUP        1 -1 0  ; _ENV "y2"
        7       [10]    RETURN          0 1

function <test.lua:12,15> (7 instructions, 28 bytes at 0x6b95f0)
0 params, 2 slots, 3 upvalues, 0 locals, 1 constant, 0 functions
        1       [13]    GETTABUP        0 1 -1  ; _ENV "y3"
        2       [13]    SETUPVAL        0 0     ; x
        3       [14]    GETUPVAL        0 2     ; sqrt
        4       [14]    GETUPVAL        1 0     ; x
        5       [14]    CALL            0 2 2
        6       [14]    SETTABUP        1 -1 0  ; _ENV "y3"
        7       [15]    RETURN          0 1
=====

However, the bytecodes of the last two function appear essentially
identical from the listing.  The instantiations of these two functions
via the CLOSURE opcodes appear identical too.  It used to be in Lua
5.1 that CLOSURE opcodes were followed by MOVE instructions, but that
seems to no longer be the case in 5.2 (?).  How does this work?


This bytecode analysis is a convenient hack, but AST analysis, like in
some of the luaanalyze/luainspect experiments I've done, can be much
more flexible.  Sometimes program analysis is done on a lower-level
representation though (e.g. static single assignment form), and a more
in-depth analysis of the luac bytecode output may likewise be
advantageous, say by adapting the LuLu [1] interpreter.

[1] http://lulu.luaforge.net/


> Another interesting option for the import everything into locals first approach
> would be a way to undeclare _ENV in the source code so that accesses past
> a certain point would be flagged as errors.


Probably doable if certain conventions are followed:

=====
local sqrt = math.sqrt
local print = print
__ENV = nil
local function f()
  print(sqrt(2), cos(2))
end
return f

$ luac  -p -l test2.lua
main <test2.lua:0,0> (7 instructions, 28 bytes at 0x6b8fa0)
0+ params, 3 slots, 1 upvalue, 3 locals, 5 constants, 1 function
        1       [1]     GETTABUP        0 0 -1  ; _ENV "math"
        2       [1]     GETTABLE        0 0 -2  ; "sqrt"
        3       [2]     GETTABUP        1 0 -3  ; _ENV "print"
        4       [3]     SETTABUP        0 -4 -5 ; _ENV "__ENV" nil
        5       [6]     CLOSURE         2 0     ; 0x6b9208
        6       [7]     RETURN          2 2
        7       [7]     RETURN          0 1

function <test2.lua:4,6> (9 instructions, 36 bytes at 0x6b9208)
0 params, 4 slots, 3 upvalues, 0 locals, 2 constants, 0 functions
        1       [5]     GETUPVAL        0 0     ; print
        2       [5]     GETUPVAL        1 1     ; sqrt
        3       [5]     LOADK           2 -1    ; 2
        4       [5]     CALL            1 2 2
        5       [5]     GETTABUP        2 2 -2  ; _ENV "cos"
        6       [5]     LOADK           3 -1    ; 2
        7       [5]     CALL            2 2 0
        8       [5]     CALL            0 0 1
        9       [6]     RETURN          0 1
=====

We see from the bytecode that __ENV is set to nil at line 3 of main,
but the global "cos" is accessed on line 5.  Moreover, if you wish,
you can even confirm here that line 3 always executes (no jumps
between op 1 and op 4).