lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, Nov 13, 2008 at 6:51 PM, Fabien wrote:
> A more challenging approach would be to try to keep the generated code as
> structurally similar as possible to the original Lua input. Of course, the
> more "tricky primitives" you rule out (setfenv, setmetatable, coroutine.*,
> ...), the easier it is to achieve structural integrity;...
> A more interesting approach would be a gracefully degrading one: as your
> program start to introduce usage of trickier primitives, you "degrade" to
> more faithful but less readable translation schemes.

Some observations...

C code can be translated fairly "mechanically" into Lua or other
languages.  This has been seen with Yueliang[1], which reimplements
the Lua compiler in Lua, and in some of my own reimplementations of C
code in Lua[2-3].  A fairly mechanical translation does have some
performance and stylistic trade-offs--e.g. C char arrays might be
represented as Lua table arrays of string chars rather than as a
regular Lua string as a more native reimplementation might use.  On
the other hand, a mechanical translation could feasibly be performed
automatically by a program.

There may, however, be the 5% of the code where a very awkward
translation would result if performed entirely mechanically.  In the
case of the Lua string Library[2], the C code used gotos, which don't
have direct analogy in Lua.  However, on closer examination the gotos
were used to express a type of tail-call recursion, which might not
have a direct representation in C (hence the gotos) though does exist
in Lua.  So, the ideal approach here was to manually transform the
gotos in tail-call style recursion in C and then more mechanically
convert that C into Lua.

Some other examples are found in lua2c[4], which translates Lua code
to C code containing Lua C API calls.  One might think that an
operation such as "a + b" in Lua should correspond to "a + b" in C.
Unfortunately, it is more complicated.  a and b could be objects
containing metamethods.  The most general solution is to convert the
expression into basically "f(a,b)" where f is a function that adds the
two arguments in the Lua way, taking metamethods into account.  lua2c,
which aims to be a general translator, converts that expression as
follows:

/* __add metamethod handler (helper function) */
static void lc_add(lua_State * L, int idxa, int idxb) {
  if (lua_isnumber(L,idxa) && lua_isnumber(L,idxb)) {
    lua_pushnumber(L,lua_tonumber(L,idxa) + lua_tonumber(L,idxb));
  }
  else {
    if (luaL_getmetafield(L,idxa,"__add")||luaL_getmetafield(L,idxb,"__add")) {
      lua_pushvalue(L,idxa < 0 && idxa > LUA_REGISTRYINDEX ? idxa-1 : idxa);
      lua_pushvalue(L,idxb < 0 && idxb > LUA_REGISTRYINDEX ? idxb-2 : idxb);
      lua_call(L,2,1);
    }
    else {
      luaL_error(L, "attempt to perform arithmetic");
    }
  }
}

...

  /* return a + b */
  lua_getfield(L,LUA_ENVIRONINDEX,"a");
  lua_getfield(L,LUA_ENVIRONINDEX,"b");
  lc_add(L,-2,-1);
  lua_remove(L,-2);
  lua_remove(L,-2);
  return 1;

Improvements are possible in special cases.  If a and b are literals
(e.g. 1 and 2), lua2c converts that simply to

  /* return 1 + 2 */
  lua_pushnumber(L,3);
  return 1;

For cases in-between (e.g. a and b are known to be simple numbers),
similar optimizations could be made, such as avoiding metamethod
calls.  The fact that "a and b are simple numbers" can be an inference
determined automatically in special cases by data flow analysis
methods, as done in optimizing compilers.  Alternately, type
annotations (pragmas) can be manully added to the source as specially
formatted comments to provide this information.  Some initial work is
being done in lua2c to support the latter (which is simpler to
implement and more general); however, ideally, the former is desired
because it is more automatic for the user (though harder to implement
well and intractable in the general case).

BTW, the short-circuiting example given above basically as

  if arr1.field1<arr2.field1  or arr1.field2>arr2[33][15] then
   call_func2("xyz")
  end

is converted to this in lua2c:

  /* if arr1.field1<arr2.field1  or arr1.field2>arr2[33][15] then */
  enum { lc1 = 0 };
  lua_getfield(L,LUA_ENVIRONINDEX,"arr1");
  lua_pushliteral(L,"field1");
  lua_gettable(L,-2);
  lua_remove(L,-2);
  lua_getfield(L,LUA_ENVIRONINDEX,"arr2");
  lua_pushliteral(L,"field1");
  lua_gettable(L,-2);
  lua_remove(L,-2);
  const int lc2 = lua_lessthan(L,-2,-1);
  lua_pop(L,2);
  lua_pushboolean(L,lc2);
  if (!(lua_toboolean(L,-1))) {
    lua_pop(L,1);
    lua_getfield(L,LUA_ENVIRONINDEX,"arr2");
    lua_pushnumber(L,33);
    lua_gettable(L,-2);
    lua_remove(L,-2);
    lua_pushnumber(L,15);
    lua_gettable(L,-2);
    lua_remove(L,-2);
    lua_getfield(L,LUA_ENVIRONINDEX,"arr1");
    lua_pushliteral(L,"field2");
    lua_gettable(L,-2);
    lua_remove(L,-2);
    const int lc3 = lua_lessthan(L,-2,-1);
    lua_pop(L,2);
    lua_pushboolean(L,lc3);
  }
  const int lc4 = lua_toboolean(L,-1);
  lua_pop(L,1);
  if (lc4) {

    /* call_func2("xyz") */
    lua_getfield(L,LUA_ENVIRONINDEX,"call_func2");
    lua_pushliteral(L,"xyz");
    lua_call(L,1,0);
    assert(lua_gettop(L) - lc_nextra == 0);
  }

The expression inside the conditional was evaluated not in place but
rather as a series of statements prior to the conditional.  The
conditional itself simply becomes "if (lc4)".  So, Lua
short-circuiting behavior is maintained, at the expense of some
structural differences introduced in the code (the addition of another
"if" statement).

lua2c is quite reliable in translation, though there remain a few
problem areas noted[4].  The main one (for some users) is lack of
coroutine support.  This can surely be added in a number of ways
suggested, but I haven't spent time on it.  Some of these may be less
portable (Coco), while others probably would further distort the
generated code (simulating stacks on the heap and adding flow control
statements to make functions resumable, which is similar to approaches
used to simulate coroutines in pure C or Perl[5-6]).  Another
difficult area was closure/upvalue support, but it is simulated in C
at a performance cost (though the cost is only paid when upvalues are
used in code).

Now, the effort to reimplement Lua in Lua[1] is interesting here.  If
you want to reimplement Lua in some language X (e.g. JavaScript), it
can be sufficient to write a translator that translates code to X from
the subset of Lua that the Lua-in-Lua code uses because then you can
use that translator to convert Lua-in-Lua to X.  For one, it's
unlikely that the Lua-in-Lua code is implemented in terms of difficult
language features like coroutines discussed above.  So, I was able
just now to take Yueliang and convert it to C using lua2c to obtain a
fully compliant Lua compiler implemented in terms of the Lua C API.
If Yueliang reimplemented the Lua VM as well, the same could be done
with that as well, and it would be fully conformant (even supporting
coroutines).  Indeed, it's somewhat pointless to translate a Lua
implementation of Lua back to C, but the general approach may have
some use in translating the Lua VM, compiler, and associated libraries
to JavaScript.  Similarly, Mochalua[7] (a reimplementation of the Lua
VM in Java) does not currently have a compiler; however, one could
compile Yueliang to bytecodes and run that in Mochalua.  Similarly, if
one did not have a Lua string library or interpreter in some
reimplementation of the Lua VM, one could compile the Lua
reimplementations[2-3] to bytecodes and use that.  There is a certain
value in having alternate pure Lua implementations of such C code.

[1] http://yueliang.luaforge.net/
[2] http://lua-users.org/wiki/StringLibraryInLua
[3] http://lua-users.org/wiki/LuaInterpreterInLua
[4] http://lua-users.org/wiki/LuaToCee
[5] http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html
[6] http://math2.org/eulermb/pod/EulerMB/Coroutine.html
[7] http://code.google.com/p/mochalua/