lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 12/29/2010 04:28 PM, Luiz Henrique de Figueiredo wrote:
I spend most of my Lua coding time writing compilers which produce
PUC-Rio Lua VM bytecode

Could you please share some details about this or show examples of the
input languages?

I've got two variations: one compiler (on hold at the moment) has a gradual type system (i.e. type annotations are optional but instead of type-inference, run-time type checks are inserted when typed and untyped code mix). The input language for this variation looks like this:

https://github.com/richardhundt/gaia/blob/master/lang/kudu/test/test.js

A more dynamic version has a nominal type system but no static type checking and looks like this (it's simpler so I'm pushing this one first until I've got modules and import/export done):

https://github.com/richardhundt/gaia/blob/master/lang/kudu/test/class.js

The syntax is heavily based on the abandoned ECMA4 draft specification (hence the .js extension), but since it uses LPeg it's pretty easy to change.

The interesting stuff for you is probably the code generator:

https://github.com/richardhundt/gaia/blob/master/src/gaia/codegen.lua

Register allocation is still a bit sketchy and there are a few other TODO's, so it's not mature. You can see the code generator being used by the compiler:

https://github.com/richardhundt/gaia/blob/master/lang/kudu/src/compiler.lua

It does this sort of thing:

      Ops{
         Call{ Id"require", String"kudu.runtime" };
         Call{ Index{ Id"kudu", String"load" } };
         Local{ { Id"this" }; Index{ Id"kudu", String"null" } };
	 ...
         Call{ Id"(init)" };
         Return{ Id"__package__" };
      }

which is a sort of table based op tree inspired by metalua syntax, which gets converted to bytecode by the code generator.

Also of interest might be the grammar (minus the type expression, these have been factored out for now):

https://github.com/richardhundt/gaia/blob/master/lang/kudu/src/grammar.lua

It uses an LPeg wrapper which is specialized for (procedural) programming languages, so it has some shortcuts for creating commonly used expression matching patterns, so you can say this sort of thing (where "p" is the parser object):

local expr_base = p:express"expr_base" :primary"term"
expr_base:op_infix("**"):prec(35)
expr_base:op_prefix"new":prec(40)
expr_base:op_postfix("++", "--"):prec(35)
expr_base:op_ternary"?:":prec(2)
expr_base:op_circumfix"()":prec(50)


It's all pretty rough at the moment (and the parser library is arguably overcooked), but this whole project is something of an experiment to see how far the goals of the Parrot VM can be achieved on the Lua VM.

It's interesting for me because Lua has gone in the exact opposite direction to Parrot. Whereas Parrot tries to cram in every conceivable feature for every conceivable language, Lua has reduced everything to the minimum and kept performance high enough that the object model, native types, etc. can be implemented using tables and userdata, and still perform well.


Also, can't you emit Lua source code instead of bytecode? As you
probably know, bytecode is not portable across versions of Lua;
e.g., 5.1 bytecode does not run in 5.2 and vice-versa.


There are two reasons for going to bytecode. The first is that Lua the language doesn't let you do arbitrary branching, so a language with a 'continue label' construct is hard to transform to Lua source. The other reason is that when producing bytecode you've got control over the debug data (line numbering, variable names etc.) which you see in stack traces.

If Lua had goto label and line number hinting (like Perl's during string eval), then I'd have stuck to generating Lua source.

With the Lua AST layer in between, it's possible to swap out back-ends to produce different bytecode. Supporting standard 5.2 bytecode is next and pretty easy, since the differences aren't huge. LJ2 would be nice, but more challenging, and I'm not sure if this project has the legs to get that far. I'd need a killer language for it, and I've discovered that designing the language is harder than implementing it.

Other potential avenues to explore would be Linear Genetic Programming (LGP) [1] with the Lua VM, where one would use the bytecode generator to create and mutate the program population. I think the Lua VM has real potential here because of the small instruction set and limited number of types, and its footprint is small enough to run tournaments over pretty large populations.

Anyway, that's my happy relationship with Lua as it stands, and of any of this interests anyone besides me, I'd be happy to elaborate.

Cheers,
Richard

[1] https://eldorado.tu-dortmund.de/handle/2003/20098