lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Am 30.04.2013 08:48 schröbte steve donovan:
On Tue, Apr 30, 2013 at 1:52 AM, Philipp Janda <siffiejoe@gmx.net> wrote:

I propose the following definition of "globals" in the context of static
global checkers:

*   Any access to a chunk's _ENV upvalue (not a local variable) is a
globals access, unless the chunk itself or any function sharing the same
_ENV upvalue potentially assigns to the _ENV upvalue.
*   Any access to a functions _ENV upvalue (not a local variable) is a
globals access, if the _ENV upvalue of the chunk was the only _ENV in scope
during the functions definition *and* unless any function sharing the same
_ENV upvalue potentially assigns to the _ENV upvalue.
*   Anything else not covered above is not a globals access.


OK, I had to read that several times ;)

I'm sorry. I'll try to explain my thinking:
IMO there are three use cases for _ENV:
1)  standard access to globals in regular Lua code
2) access to globals in sandboxed Lua code (reduced/slightly modified set of default globals) 3) a Lua dialect where you want to use Lua's syntax for a domain specific language.

For 1) the globals checker is most useful out of the box, because it can catch typos of predefined globals, and it can assume that no metatable magic is at work (or that at least it is compatible with the usual Lua semantics), so you can match reads and writes, etc. The sandbox case 2) is similar to 1) except you have to supply a different list of predefined globals. The point here is, that sandboxes usually load code via one of the load* functions, so no lexical _ENV tampering takes place. And sandboxed code is stored in a separate chunk, so you *can* change the list of predefined globals via a commandline switch. The third case is the most interesting, because here the above rules come into play. You use _ENV to specify a program in a DSL, like e.g. an LPeg grammar[1], meaning that you probably provide a completely different set of predefined globals, and/or that you catch global accesses via a metatable. In that scenario the usual rules like "anything that you read you must write before" often don't apply, and the usual Lua library is not available as globals (that would interfere with the metatable magic, and it probably isn't that useful for the DSL anyway). Additionally such code is usually embedded in a chunk of normal Lua code, so you have no (easy) way of telling the globals checker about the different set of globals. In short: For such "dialects" a globals checker probably produces more false hits than it catches actual typos, so it should leave those dialects alone and concentrate on the surrounding regular Lua code.

The rules above are for figuring out which parts of the code are considered regular Lua code, and for which parts a customized _ENV has been set lexically (-> "dialect"), so that all bets are off anyway.

  [1]: http://siffiejoe.github.io/lua-luaepnf/#Basic_Usage


Right, globals are usually upvalue references to the special symbol _ENV.
It's not guaranteed that this upvalue actually points to _G, of course,

Right. My assumption is that you mostly apply a globals checker to a Lua file which contains regular Lua code at the topmost level. If not you can always *not* use the globals checker or supply a different list of predefined global names for this file.

and
_ENV may not be an upvalue if defined as a local (look at code for

That's the point: If _ENV is a local, the programmer has modified the environment (typically to add/remove/replace/collect globals), which means that some embedded Lua dialect is in effect, and the globals checker should react to this (by shutting up, at least for the predefined globals).

print(boo()) here)

local print = print

Here the _ENV in _ENV.print refers to the chunks's upvalue (AFAICT from the code snippet), so the globals checker should check `print`.

local _ENV = {X = 'hoo'}

function boo() return X end

Here, a local _ENV is in effect, so the globals checker should *not* report a write to `boo`. Since a local _ENV (not the chunk's upvalue) was in scope during the definition of `boo`, the globals checker should not report access to X either.

print(boo())

Local _ENV still in effect, so don't report access to `boo` here as well.


Static checkers _could_ be taught to handle this case, but in general _ENV
might be assigned to something dynamically.

But you can check if there is any assignment to the _ENV upvalue, which for some circumstances would change the environment for all functions that share the same upvalue, and you must assume (ignoring a programmer error), that all those functions might use the globals in this changed environment. So the Lua code in those functions is a Lua "dialect", and the default list of globals is incomplete or wrong. You are right that you cannot detect with certainty if two _ENV accesses actually use the same _ENV value (at least not statically), but IMHO that does'nt matter because the usual common rules like "write before read" don't apply anyway.


[...]

So we're going beyond plain 'global' access here.  Just finding globals is
fine and dandy, but David M's insight was that we could track _fields_ of
known globals as well.  Further, lglob tracks _aliases_ to known globals
and imported modules.

My rules above are only concerned with globals accesses.


lglob does get plain module() right (as its 5.2 friend '_ENV={}') because
it regards everything after module() or _ENV as a separate scope, and then
tracks accesses in that scope specially.  So 'Answer' here is considered a
problem:

_ENV = {}  -- or spell it 'module(...)' ;)

function answer() return 42 end

function life() return Answer() end

return _ENV

Module code using `module` (or _ENV) is a special variant of the "dialect" case: You have metatable magic (if using package.seeall) or a completely different (empty) set of global library functions (if not using package.seeall). And you have fields you never write to (like _M, _PACKAGE, _NAME, etc.). The only thing is that `module`-using module code is still mostly regular Lua code, so it feels like you could almost support it in a globals checker ...


Now PA's case involves tracking multiple scopes. This is a silly example,
but it shows the issue.

local function private_business(val)
    local _ENV = {}
    X = val + 1
    Y = val - 1
    return X + Y
end

Again, this can be done, by tracking the scope of local _ENV in functions,
but it seemed a lot of work for a case I did not particularly find
interesting.

I would actually consider this a misuse of _ENV, and I agree that Lua "dialects" are rare, but they will come up from time to time. Applying my rules would result in the globals checker shutting up for this private_business function (and for any chunk using `module` or `_ENV = {}`), thus reducing the number of false hits.


And as for Tim Hill's point - yes, Lua is the best parser, and that's
exactly why we're using the output of the Lua compiler for checking.

The way I understood it, Tim Hill wants to detect free variable accesses (as opposed to _ENV field accesses), and IMO this just doesn't make a difference. The current luac and lbci are good enough by catching _ENV accesses.


steve d.


Philipp