lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


2011/6/23 HyperHacker <hyperhacker@gmail.com>:
> I was just wondering to myself if LuaJIT could ever be hacked to
> function as a Lua compiler, that compiles scripts to standalone
> executables or libraries. It'd be great if we could take advantage of
> its speed and the FFI to write Lua extensions in Lua that would
> normally have to be written in C... :)

Hi,

me too I've been thinking about this and I guess it will not happens
because of how the JIT compiler works and because the design of Lua as
a programming language.

The reason is the LuaJIT2 is a trace compiler and, as far as I
understood it works by letting the program run in interpreted mode, it
does analyze in real time the code that is executed and it tries to
generate optimized native code for the running application. This way
of working is very different from a classical optimizing compiler like
for C or C++ where the optimization is based only on static analysis
of the code and the type signature of variables and functions is
strongly used to produce optimal machine code.
LuaJIT2 would never be able to produce optimized code without running
the code because it never perform static analysis of Lua code. I don't
know if you can save in theory the generated optimized code in a file
so that it can be executed later without trace analysis. May be this
is possible but probably this is a very difficult engineering task and
the interest of having that is very low because the JIT can always
produce optimized code on the fly very quickly and for every
architecture where you may run so there is no much interest in
producing an executable.

Another remarkable difference between JIT code and statically
optimized C code is that in C all the code is uniformly optimized in
the sense that every fragment of code is completely optimized
according to the optimization flags. In JIT compiled code this is
different because LJ2 produces highly optimized code for the inner
loops while for the outer loops the optimizations are much less
aggressive or even absent since some code can be left in interpreted
mode if the heuristic algorithm used decide that it is not critical.

The dualism between static analysis (like with C) and trace analysis
is strictly tied to the dualism between programming languages with
static and dynamic typing. This latter kind of languages like Lua or
Python does not admit a static compilation by design because you don't
have any information about what type of value a variable can contain,
the information is only available at run time. Also the mechanism of
table lookup that is extensively used in Lua and Python does not allow
any static analysis because the information are available only at run
time and everything can change during the execution.

Even javascript is of the same family of Lua and Python and it does
also have a JIT compiler. It is interesting to remark how this dualism
is now very clearly defined between C/C++ with a strong typing systems
and optimizing compilers in one side and Lua, Python and JavaScripts
with dynamic typing and associative arrays and very advanced JIT
compiler on the other side. These two approaches are two different
responses to the need of producing optimized code from an high level
programming language. I believe this duality will stay for a long time
now and I'm wondering when a new programming paradigm will emerge to
replace both of them. In my point of view there is a strong demand,
the programming in C or C++ requires to take care manually of a lot of
small details and even a small error can have catastrophic
consequences like your application disappearing suddenly (segmentation
fault) or memory leaks that eats all your memory.

Programming languages like Lua or Python are very attractive because
can let people work on the logic of their applications and it does
allow a much more rapid development of complex applications. The other
side of the coin is that these programming languages are very
difficult to optimize because everything is dynamic and they therefore
defeat any static analysis, only JIT trace compiler are possible.

Another approach in the mid between them is the Java approach that
have both a static strong type system and a VM with JIT code
generation. In my point of view this is the *wrong* solution because
it takes the worst of each world. The typing system and the rigid
class based system with explicit exceptions declaration make the code
very verbose and cumbersome to write taking away a lot of the
programmer's time. From the other side you don't even have a static
optimization but only a JIT code generation that, while effective,
does eat a lot of resources in term of memory and execution time.
The only reason why it was and it is so successful is that managers
likes it because you can have a lot of contracts explicitly declared
to use modular programming techniques an it is really *safe* in the
sense that it cannot crash at any moment just because you forgot to
check a null pointer or something like that (just a small parenthesis
about this latter point, this advantage is often overestimated because
if it true that Java cannot crash it is true that you can have
unhandled exceptions or exceptions that are inappropriately handled
resulting in non-functional behaviours).
So Java is both boring to write with a lot of boiler-plate
declarations and so on and the execution environment is largely
suboptimal with a lot of resources required, the omnipresent GC.

I hope my email was not boring even if it is certainly OT, I've taken
the opportunity of this thread to share some reflections about
optimized code generation and programming languages.

-- 
Francesco