lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


LuaJIT Roadmap 2011
-------------------

LuaJIT is a Just-In-Time (JIT) Compiler for Lua. It's compatible with
standard Lua 5.1 and can significantly boost the performance of your
Lua programs. LuaJIT is open source software, released under the MIT/X
license.

LuaJIT is available from: http://luajit.org/

This is the LuaJIT roadmap for 2011, bringing you up to date on the
current and future developments around LuaJIT. I'm happy to answer
your questions here on the Lua mailing list, on reddit or by mail.

* Current Status
* FFI Library
* Dual-number VM
* Sponsored ARM port
* Outlook on LuaJIT 2.1
* Release Schedule

Current Status
--------------

LuaJIT 2.0.0-beta5 has proven to be quite stable. Thus I've held back
on releasing new betas in the past five months and worked on various
new features and improvements.

Barring unforeseen difficulties, LuaJIT 2.0.0-beta6 will be released
in the next 1 or 2 weeks. It would be helpful to get early feedback
from testers before the release. Thank you in advance! The LuaJIT git
repository is available from: http://luajit.org/download.html

Here are the main changes between beta5 and beta6:

- The sponsored port of the LuaJIT interpreter to the PowerPC/e500v2
  cores is now complete. The hand-optimized assembler code of the
  interpreter has been rewritten for the PPC/e500 dialect. It takes
  advantage of several architectural features, e.g. vectorized loads
  and compares are used to speed up dynamic type checks.

  The speedup over the Lua interpreter is a factor of 2x - 4x and in
  some cases up to 6x. This is similar to the speedups seen on x86/x64
  when comparing the pure interpreters (select the interpreters on
  http://luajit.org/performance.html ). Further gains are only
  possible with a port of the JIT compiler.

  Please note that the e500v2 has a different FPU than most other
  PowerPC CPUs. This port will *NOT* run on other PPC-based machines
  (e.g. game consoles)! A port of the JIT compiler and/or a port to
  other PowerPC CPUs may follow later.

  As a side-effect of the port, overall portability of the code base
  and cross-compilation support has been further improved.

- The long-awaited LuaJIT FFI library has been merged into the code
  base. Please see the next section for details.

- Various minor features from Lua 5.2 have been added:
  - Hex escapes and '\*'-escape in string literals.
  - string.format("%q", str) is fully reversible.
  - "%g" character class in patterns.
  - Tighter check on table.sort callback compliance.
  - os.exit(status|true|false [,close]).
  - __pairs/__ipairs metamethods (needs -DLUAJIT_ENABLE_LUA52COMPAT).

  Note that LuaJIT 2.0 already supported other features since its
  first release, that later went into Lua 5.2. E.g. bit operations or
  a fully resumable VM (yield across pcall).

  Most other changes in Lua 5.2 cannot be merged into the LuaJIT code
  base, because they break compatibility with the Lua 5.1 API/ABI.
  This is not acceptable to the majority of my user-base. Given that
  Lua 5.2 provides few tangible benefits, adoption will likely be
  rather slow. So LuaJIT will stay compatible with the Lua 5.1 API/ABI
  in the near future.

- Changes to the core parts of the VM:
  - Specialized bytecode for pairs()/next(). Speedup: 3.5x.
  - The parser recognizes 64 bit integer literals (1LL, 1ULL) and
    complex literals (1.5i) for use by the FFI library.
  - The bytecode can embed these literals, too.

- The JIT compiler has seen some smaller improvements:
  - Calls to vararg functions are compiled.
  - select() is compiled.
  - Alias analysis has been improved, esp. for loads from allocations.
  - Various compiler heuristics have been tuned.

As you can see from the above list, LuaJIT 2.0.0-beta6 is a 'feature
release'. It'll likely need quite a few fixes and will be followed by
beta7 in Q1/2011, which is focused on stability.


FFI Library
-----------

The FFI library allows calling external C functions and the use of C
data structures from pure Lua code.

The FFI library largely obviates the need to write tedious manual
Lua/C bindings in C. It doesn't require learning a separate binding
language -- it parses plain C declarations, which can be cut-n-pasted
from C header files or reference manuals (*). It's up to the task of
binding large libraries without the need for dealing with fragile
binding generators.

The FFI library is tightly integrated into LuaJIT (it's not available
as a separate module). The code generated by the JIT-compiler for
accesses to C data structures from Lua code is on par with the code a
C compiler would generate. Calls to C functions can be inlined in
JIT-compiled code, unlike calls to functions bound via the classic
Lua/C API.

(*) In case anyone wonders: Yes, this means the FFI library includes
a full-blown C parser (actually C99 + GCC/MSVC extensions). It's
currently missing a C pre-processor. Some C++ features are supported,
too. But complete C++ support is not coming anytime soon. :-)

Preliminary documentation for the FFI library is available in the git
repository. The Lua mailing list recently had some related threads,
too. Here are just a few examples to whet your appetite:

Using standard POSIX library functions, which are not provided by Lua:

  local ffi = require("ffi")
  ffi.cdef[[
  int mkdir(const char *pathname, unsigned int mode);
  int rmdir(const char *pathname);
  ]]
  ffi.C.mkdir("/tmp/testdir", 0x1ff)
  ffi.C.rmdir("/tmp/testdir")

Popping up a message box on Windows:

  local ffi = require("ffi")
  ffi.cdef[[
  int MessageBoxA(void *w, const char *txt, const char *cap, int type);
  ]]
  ffi.C.MessageBoxA(nil, "Hello world!", "Test", 0)

Wrapping an external library (libz):

  local ffi = require("ffi")
  ffi.cdef[[
  int uncompress(uint8_t *dest, unsigned long *destLen,
                 const uint8_t *source, unsigned long sourceLen);
  ]]

  local zlib = ffi.load("z")

  local function uncompress_string(comp, origsize)
    local buf = ffi.new("uint8_t[?]", origsize)
    local buflen = ffi.new("unsigned long[1]", origsize)
    assert(zlib.uncompress(buf, buflen, comp, #comp) == 0)
    return ffi.string(buf, tonumber(buflen[0]))
  end

The FFI library allows you to create and access C data structures from
pure Lua code. Of course the main use for this is for interfacing with
C functions. But they can be used stand-alone, too.

E.g. I've converted SciMark for Lua to use the low-level FFI data
structures with a sizeable gain in performance. The results for GCC,
JVM and LuaJIT+FFI are only a few percent apart. More details can be
found here: http://lua-users.org/lists/lua-l/2010-12/msg00924.html

Full support for all C data types implies that LuaJIT now supports
64 bit integers and complex numbers, as well as the corresponding
number literals (1LL, 1ULL, 1.5i).

Please note that some parts of the FFI are still incomplete. Some
issues, like support for 64 bit arithmetic for all backends, will be
fixed before beta6. But others, like complex arithmetic, will have to
wait. Also, the JIT compiler currently doesn't compile every corner
case of FFI operations: it bails out and transparently falls back to
the interpreter, you can check this with the -jv command line option.

The need to allocate heap objects for carrying C data types may cause
some inefficiencies. Most of these will be resolved with the addition
of generalized allocation sinking and store sinking optimizations to
the JIT compiler. However this feature will not make it into beta6.

The FFI library has been carefully designed to be extensible. E.g. the
FFI library will probably gain support for native vector operations or
for parsing a subset of C++. Development will continue in parallel to
other parts of LuaJIT. New features will be prioritized based on
user-demand and sponsoring.


Dual-number VM
--------------

The Lua language is specified to have a single number type. Currently
LuaJIT only supports 64 bit IEEE-754 compliant FP numbers ('double').
This works just fine for x86/x64 platforms with their excellent
floating-point performance. A unified number representation has many
advantages and the JIT compiler can get away with narrowing only some
select operations to integer arithmetic.

However this approach is unlikely to yield acceptable performance on
lower-end CPUs for mobile or non-desktop/non-server platforms. Most of
these CPUs either support only software floating-point arithmetic or
have slow hardware FPUs.

As a prerequisite for the ARM port (see the next section), dual-number
capability will be added to the LuaJIT VM, the LuaJIT interpreter and
the JIT compiler.

Numbers will be internally kept as 32 bit integers, wherever possible,
and transparently widened to floating-point numbers. This change is
invisible at the Lua source code level. It's expected that carefully
written applications for low-end platforms will be able to avoid
floating-point computations with only few changes to the source code.

Adding dual-number support to the LuaJIT VM is a major change. For
stability reasons, this feature needs to be prototyped first for the
existing x86/x64 port of LuaJIT (even though it's not that useful for
this platform). Work on the actual ARM port of LuaJIT can only start
after the dual-number support is complete.


Sponsored ARM port
------------------

I'm happy to announce that QUALCOMM Inc. is sponsoring an ARM port
of LuaJIT 2.0. My personal thanks go to Marc Nijdam, who arranged
the sponsorship!

The initial target for the ARM port are low-to-middle-end ARM-based
devices. The port will require a CPU conforming to the ARMv5
architecture (ARM9E cores or better) with software floating-point
(no FPU needed) and the classic ARM instruction set.

The initial port ought to run on upwards-compatible hardware, but
possibly with suboptimal performance. An ARM port of LuaJIT which
makes use of the VFP unit (hardware FPU) or other instructions set
extensions may follow at a later point in time.

LuaJIT/ARM will compile out-of-the-box for a GCC-based toolchain
targetting Linux/ARM-based systems. Other operating systems will
be supported through an enhanced porting layer which abstracts
away OS-specific functionality. This is mainly about memory
management and the specific needs of a JIT compiler. The goal is
to allow easier embedding of LuaJIT in custom OS environments.

The port will be done in three phases:

Phase #1: Dual-number support for LuaJIT, prototyped for x86/x64.
This is a basic requirement for the softfp ARM port. Please see
the previous section for details.

Phase #2: ARMv5/softfp port of the LuaJIT interpreter.

Phase #3: ARMv5/softfp port of the LuaJIT JIT compiler.

You can follow the progress in the LuaJIT git repository as usual.
The ARM port will take several months, so there may be interim beta
releases which already include part of the functionality.


Outlook on LuaJIT 2.1
---------------------

LuaJIT 2.0 has been in beta for more than a year now. Not that this
is unheard of in the industry. :-) The main reason is not a lack of
stability -- in fact the beta releases are successfully used in
production environments.

But the "beta" label gives me the ability to freely add features and
to just go ahead with bigger redesigns of the code base. There are
still a couple of features I want to include in LuaJIT 2.0, but of
course I have to make a cut somewhere.

My current plan is to freeze the LuaJIT 2.0 branch somewhere in 2011
and get a release candidate out. The 2.0 branch will turn into the
stable branch and will receive only bug fixes.

Shortly after that, development on LuaJIT 2.1 will start. All of the
minor changes that didn't make it into 2.0 will go on the TODO list
for 2.1 of course. I'll update you on the details, when the actual
switch happens.

The one major change that will likely happen first is a new garbage
collector for LuaJIT 2.1. I've already experimented with this on 2.0,
but it turned out to cause too much instability for the code base.

The standard Lua 5.1/LuaJIT 2.0 garbage collector is just not up to
the task to handle big heaps. And both it's allocation speed and the
collector throughput leave something to be desired. So I'm planning to
switch to an integrated allocator and garbage collector. It's going to
be an incremental, generational, non-copying GC.

Naturally the main user-visible effect will be performance gains in
allocation-heavy workloads. Some of the related changes, like morphing
metatables into specialized data types on-the-fly or segregated
finalizer handling will allow giving tables or other objects __gc
metamethods, too.


Release Schedule
----------------

 1-2 weeks - Release of LuaJIT 2.0.0-beta6 (features)
   Q1 2011 - Release of LuaJIT 2.0.0-beta7 (stability)
Q1-Q2 2011 - ARM port of LuaJIT 2.0
Q1-Q3 2011 - Some more beta releases for LuaJIT 2.0
   Q3 2011 - Release candidate of LuaJIT 2.0
Q3-Q4 2011 - Release of LuaJIT 2.0.0 final
   Q4 2011 - Work on LuaJIT 2.1 starts

Please note this is a tentative schedule, for your orientation only!
I cannot give you any guarantee whatsoever for the correctness of the
release dates.

--Mike