lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi list,

About 3 months ago, Xavier suggested extending Peter Shook's (in) syntax to allow "namelist 'in' expr" anywhere the parser might otherwise accept a standard exprlist.

Those of us involved in the discussion agreed that this would be a handy extension. Gavin's suggested use case was:

for _,record in ipairs(list) do
  display(name,adr,email,job in record)
end

Sadly, it seemed likely to be a troublesome patch to write, as distinguishing the namelist case from a more general exprlist looked like it would require some type of multi-pass parsing.  Xavier seemed pretty confident it could be done though.  

Yesterday afternoon, I let myself take another look at lparser.c, in hopes of finding a simple implementation.  While implementing Xavier's grammar mod in the general case still looks like it would require substantial parser changes, I did manage to put together a lightweight hack that appears to support Gavin's use case.  

My approach was to limit the syntax changes to the case where funcargs() is given a series of arguments having the form "namelist 'in' <local or upvalue>".  In those cases, each element of the namelist will have generated only a single instruction, and thus it's relatively straightforward to walk backwards through the instruction list after the parser hits the 'in', converting all the argument push instructions to appropriate get table ops.

I still don't understand how lua's byte code generation works, so, I'm not certain that these instruction rewrites are safe.  But the approach appears to work in practice, and I have a strong hunch that limiting the source table to an upvalue or local does simplify things sufficiently to make the hack safe.  

Gavin asked for an email if anyone got a patch of this form working.  While I've only implemented a scaled back form of Xavier's proposal, I think it will cover a good chunk of the remaining cases were I'd want to use "namelist 'in' expr" in place of an exprlist.

I suspect that with a little care, the instruction rewriting approach could also be used inside constructor(), which would allow things like things like "local t = {a, b, c in t}".  I haven't tried this yet though :)

-Sven

PS: I'm not yet confident enough in my code to package it as a patch.  In particular, I'm not yet sure I'm extracting the variable names associated with an OP_MOVE or OP_GETUPVAL properly, though, I am currently passing all the tests I can think to try.

For the benefit of those of you who enjoy messing around with parser hacks, here's a my current code:

I've modified the '(' case of funcargs as follows:

  case '(': {  /* funcargs -> `(' [ explist | namelist in var] `)' */
      luaX_next(ls);

      start_explist:

      int pc0=fs->pc;
      int num_expressions=0;

      if (ls->t.token == ')')  /* arg list is empty? */
        args.k = VVOID;
      else {
        num_expressions = explist(ls, &args);
        luaK_setmultret(fs, &args);
      }

      if( testnext(ls,TK_IN) && num_expressions) {
          luaK_exp2nextreg(fs, &args);  /* close latest argument */  

          // if we don't have exactly 1 instruction per _expression_, then
          // we're certainly not dealing with a namelist, so we can throw
          // a syntax error.
          if(num_expressions!=fs->pc-pc0) {
            luaX_syntaxerror(ls,
                "function (in) arguments are not a simple name list,");
          }

          // attempt to rewrite the instructions between pc0 and fs->pc
          // as OP_GETTABLE or OP_GETTABUP's.  
          
          convert_in_instructions(ls,fs,pc0);

          // allow multiple 'in' lists to be passed as arguments
          if( testnext(ls,',') ) goto start_explist;
          else args.k = VVOID;
      }

      check_match(ls, ')', '(', line);
      break;
    }

My instruction converter then looks like this:

// as long as the explist parsed by funcargs() was, in fact,
// a namelist, instruction conversion should be possible. 
// and, in those cases were funcargs() wasn't passed a
// namelist, we should generally be able to throw a sensible
// error.
static void convert_in_instructions(LexState *ls, FuncState *fs, int pc0) {
    int pc1=fs->pc;

    Proto *f=fs->f;

    // parse the table _expression_.
    // we'll end up throwing a syntax error if this is anything
    // more complex than an upvalue or a local.
    expdesc table_e;
    primaryexp(ls, &table_e);

    for(int p=pc0;p<pc1;p++) {
        Instruction instruct=f->code[p];
        int op=GET_OPCODE(instruct);
        int a=GETARG_A(instruct);
        int b=GETARG_B(instruct);
        int c=GETARG_C(instruct);

        TString * field_name=NULL;

        switch(op) {

        // a true local variable in the namelist generates an OP_MOVE
        case OP_MOVE: 
            if(b<f->sizelocvars) {
                field_name = getlocvar(fs, b)->varname;
            }
            break;  

        // an upvalue, meanwhile, generates an OP_GETUPVAL
        case OP_GETUPVAL: 
            if( b<f->sizeupvalues ) {
                field_name=f->upvalues[b].name;
            }
            break;
        
        // a global name, on the other hand, generates a table
        // index.  this is a trickier case to handle well.
        case OP_GETTABUP: 
        case OP_GETTABLE: 
            if(ISK(c)) {
                int kindex=INDEXK(c);
                TValue *kvalue = &fs->f->k[INDEXK(c)];

                if (ttisstring(kvalue)) { 
                    field_name=rawtsvalue(kvalue);
                }

                // there's at least one case where a non-namelist will generate
                // instructions that end up being converted, namely:
                // f(_ENV["var"] in b)
                // the _ENV["var"] _expression_ is, at this point,
                // indistinguishable from a simple global name.
                //
                // any table access other than _ENV, however, can be identified 
                // as a syntax error.
                int v = searchvar(fs, ls->envn); 
                if (v >= 0) {
                    // _ENV is a local at v, so, OP_GETTABUP or v!=b indicates an
                    // error
                    if(op==OP_GETTABUP || v!=b ) {
                        luaX_syntaxerror(ls,"function call (in) was not given a simple name list"
                        "(found a local table access),");
                    }
                }
                else {
                    // _ENV is an upvalue at v, so, OP_GETTABLE or v!=b indicates
                    // an error.
                    v = searchupvalue(fs, ls->envn);
                    if(op==OP_GETTABLE || v!= b) {
                        luaX_syntaxerror(ls,"function call (in) was not given a simple name list"
                            "(found an upvalue table access),");
                    }
                }
            }

            // i'm fairly sure that when singlevar generates a global, it
            // will always have a constant key.  thus, !ISK(c) implies a
            // syntax error.
            break;          
        }


        if(!field_name) luaX_syntaxerror(ls,
                            "function call (in) was not given a simple name list,");
        expdesc key;
        codestring(ls, &key, field_name);

        int field_name_const=RKASK(key.u.info);

        int table_info=table_e.u.info;

        switch(table_e.k) {
        
        case VUPVAL: 
            f->code[p]=CREATE_ABC(OP_GETTABUP, a, table_info, field_name_const);
            break;

        case VLOCAL: 
            f->code[p]=CREATE_ABC(OP_GETTABLE, a, table_info, field_name_const);
            break;
    
        default:
            luaX_syntaxerror(ls,"function call (in) does not reference a local table,");

        }
    }
}