lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Beware of patterns if you have still not canonicalized or prettyprinted your Lua code.

As well beware of comments (including multiline comments starting by square brackets) that you should not alter in their content (you may still pretty print their initial and final tags)
 finally beware of multiline strings that start also by brackets (and that should be converted to concatenated strings (with '..'), with an explicit '\n' escape at each newline.

The alternative is to first prepend the content of these multiline comments or multiline strings by a static character at the begining of each line, which cannot be a valid character starting a prettyprinted line of Lua source code (for example you may use a '!' which won't complicate your search patterns that will invalidate any line matching the '^!' pattern.
After your edits, remove every "^!" from the modified prettyprinted code.

You may want to canonicalize as well the semicolons used at end of every instruction in the pretty printed code after parsing it to know if it is expected (this will help resolve the ambiguous newlines when there are function calls (but without assignement of the returned value) on a line following an assignment or a return statement: this actually the same statement and there's no implied ';': the pretty printed code should merge these two lines and insterad split after the opening parenthese of the function call.

Once you've done this pretty printing (and temporary modification of multiline strings or comments, you can apply your patterns to perform code detection/substitution.

But your main difficulty is to determine the scope of identifiers: are they local ? are they parameters ? were parameters hidden in the same scope by a local redeclaration of the same name but for a distinct variable ? For all that you need a syntax analysis and then being able to determine the start and end of each scope, and then you can build a true mapping of variables used by functions, including variables that are part of an "outer closure", accessible to that function but still not really "global".

You may also want to canonicalize the curified function calls with explicit parentheses around their first _expression_ parameter (beware of priorities: "f x+y" means "f(x)+y", the first parameter of curryfied function calls (with implied but omitted parentheses) are restricted to unary expressions only, except negation ("f -x" means a binary substraction operation between f and x, not the function call "f(-x)" which requires the parentheses;  and "f 1 f 2" means "f(1)(f)(2)" and not "f(1); f(2)"... even if there are some newlines or comments anywhere).

So some syntaxic sugars in Lua (notably curified calls) can be a nightmare for detection by patterns.

If you find no parameter names (for a function) and no local variable in your file, then the occurence of the name is
- "global" (i.e in the local scope of the parent including that file by transclusion, something that should not be used),
- or local to that file (if that Lua source file is included by a "require" statement), or more exactly part of a file-level closure (the closure is created by the require statement itself), and this includes the "_G" variable itself (which actually belongs to the parent closure where it is a local member that may not be the same _G variable used in all contexts where your file may be "require()"d within separate closures or that could have been reassigned by the code.

You cannot really determine the scope of named members (object.member or object['member']) of any parent object, including the _G parent, without running the code, because _G or th parent object may have been reassigned elsewhere by some function called inside your module file but defined externally) and asume that they refer to the same object or value.

So the best you can use is to use the regular Lua syntaxic parser to get a feed of tokens (and tokens for identifiers will have properties for their registering their scope (as start/end file positions and for the set of variables in parent closures and the ordered list of parent closures).

external variables seen that are not defined in any locally defined scope are in some parent scope (but not necessarily the same scope as "_G" defined in that parent scope).

The lexical scopes of closures are not just for function definitions, they also exist in "for" loop statement and each new declaration of a local creates a derived scope distinct from the first scope of the function definition or at start of the whole module file (it is also legal to redeclare the same name to create a new variable that will hide the previous one).

Finally there's the difficulty of functions defined with the syntax "function name() ... end", and not "name = function()... end"; the scope of the 'name' is the block containing it but excluding the prior scope where the name was also defined. The first syntax allows a name for be used in prior code (as a forward reference) before it is actually define, this is not the case with the anonymous definition by an assignment to a variable (declared with "local" or not), which because that name can only be used by backward reference (and it's much simpler for the parser). The lexical scope is terminated by the "end" keyword in the same block or function definition, or by the reassignment of the variable, or by another local declaration with the same name.

For this reason, using patterns to transform code is very unsafe.




Le jeu. 27 juin 2019 à 06:45, Abhijit Nandy <abhijit.nandy@gmail.com> a écrit :
Hi,

My Lua functions often exceed 500 lines and I then need to refactor blocks in if statements and loops to a separate function. I use MSVC a bit and it has a plugin called Visual Assist which can convert a given chunk of C/C++ code into a function with the proper required arguments. Its very useful in a hurry :)

Has anyone tried something similar for Lua? Basically I would want to give it a block of code and it would need to deduce the required arguments for the function and give me back the code string as "function (args...) ... end".

I am going to try it today with regexes and maybe ltokenp, but wanted to check if someone has already tried it.

A simple converter could perhaps detect patterns like "<0 or more spaces><1 or more valid variable name chars><Lua delimiter like '(' or ',' >" and then put the variable names into the argument list if they are not present in _G. Reserved words skipped.

Not sure of the relevant part in lexer.c yet, that I could perhaps convert to do this.

Thanks,
Abhi