Re: ANN: LuaJIT 1.1.0

lua-l archive
[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]
Subject: Re: ANN: LuaJIT 1.1.0
From: Mike Pall <mikelu-0603@...>
Date: Wed, 15 Mar 2006 01:50:45 +0100
Hi,

Paul Chiusano wrote:
> What are your future plans for LuaJIT,

This depends on the feedback I'll receive from LuaJIT users.

* "Make it produce faster code" is one rather obvious goal. But
I have to know which area to target first.

E.g. Adam made some comparisons between Lua code and equivalent C
code. He sent me a few code snippets which show exactly what's
slow and what needs to be tuned. This is very helpful and I can
encourage other users of LuaJIT to do the same. Please note that
I cannot analyze complete applications -- small and up to the
point code snippets (without complex dependencies) are best.

* Another goal is better portability (to non x86 CPUs). I think
embedded CPUs would benefit most. I had a cheap Linux based
DSL/VoIP router here for a few days (switched my parents home
over to VoIP). This cute little thing (size of a sandwich loaf)
runs Linux on a 200 MHz MIPS32 CPU with 8 or 16 MB RAM. It's
adequate when used with compiled C code, but interpreted Lua runs
really slow. The tiny cache and the lack of out-of-order
execution is a killer for interpreters.

IMHO Lua is the only scripting alternative due to severe size
constraints (2 or 4 MB flash is really tight). MIPS32 code would
also run on the PS2 or PSP, which will still play a role in the
game market for a while. ARM is an interesting target for other
embedded devices and PDAs (XScale).

This box and other embedded systems would benefit greatly from
LuaJIT. I'm self-employed and would rather work on LuaJIT than
other (less interesting) projects. So this is the plea:

  I'm actively looking for sponsors who want to see LuaJIT ported
  to their favourite CPU. If you are a big company or have the
  necessary funds to pay a developer for several months, please
  contact me by mail. I will keep all negotiations confidential.
  The result of the port has to be available as open source of
  course.

[Another option is the GPL + commercial license route (like MySQL),
but I'm not sure this would work out.]

> and how fast do you think a just-in-time compiler for Lua could be?

Only the sky is the limit. No, seriously, it's more a matter of
how much work one is able to put into the compiler. GCC and other
top performing compilers have seen many years of coordinated
development effort. And there are lots of research papers on how
to optimize C or Java code. But the good papers on optimizing
dynamic languages are far and few between.

Right now LuaJIT is at the point where all the low hanging fruit
have been picked. Any further performance gains will only be
incremental, but take comparatively more work.

The real limit is how much free (or paid) time I can spend
working on LuaJIT. I just don't know at this point in time.
And I have some other Lua projects on the back-burner, too.

> Also, I'm curious: what are
> the real sources of slowness for a dynamically-typed language like Lua
> -  is it mostly instruction decoding,

This is only relevant for the interpreter.

> or is it having to resolve things at run time (like figuring
> out what function to call for the expression 'a + b'),

This is quite easy in Lua because most opcodes have only one
dominant receiver class. Even the interpreter inlines the number
case for arithmetic opcodes.

The LuaJIT optimizer is pretty good at detecting monomorphism.
The new adaptive deoptimization support in LuaJIT 1.1.0 makes
backing down in case of undetected polymorphism relatively cheap.
Aggressive optimizations can be done without compromising Lua
semantics. I think I've covered all of the commonly used
monomorphic cases for opcodes now.

> the lack of inlined functions (I mean pure Lua functions),

This depends on the coding style. I'm not sure about the overall
effect in most Lua apps. It's probably not so dominant for the
Lua interpreter because other overhead shadows it.

OTOH in typical OO-intensive Smalltalk or Self programs one
really needs to do function inling to reach acceptable speeds.

It's on my TODO list for LuaJIT, but I think other optimizations
would pay off more and should be done first.

Inlining many standard library functions (C functions) in LuaJIT
1.1.0 payed off a lot. But this is partly due to the reduced call
overhead, partly due to specialization and partly because of
direct access to internal structures.

> function call overhead,

This is pretty low for an interpreter (if compared to other
interpreters). But it's relatively high when you compare LuaJIT
to other compilers.

The main reason is that LuaJIT still uses the Lua frame and stack
structures. This makes it easy to switch between interpreted and
compiled code. And most of the debug support can be reused, too.

Reducing the function call overhead any further is hard without
major conceptual changes. Inlining short Lua functions may be
easier (and is potentially faster).

> What do you think the performance limits are for just-in-time
> compilation in Lua?

* Lua has only a single number type. This simplifies many things
and even using a double doesn't make much of a difference for the
interpreter. But now that many other things have been optimized,
it shows in LuaJIT. Array indexing is slow (compared to C)
because it needs too many type conversions (double <-> int) and
bounds checks.

Narrowing numbers to integers with help from the optimizer is one
way to go. Dual number support (int + double) would have benefits
for embedded CPUs (lacking FPU hardware). But it's tricky to get
this fast for the interpreter and even more so for compiled code.
I guess pure integer support is too limiting for most embedded
projects (but would be really fast). [I need feedback on this
topic from people who use Lua on embedded devices.]

* Lua has only a single generic container type (tables). Again this
simplifies many things and has little impact on the interpreter.
But it puts a limit on what can be optimized in a JIT compiler
with only local knowledge. Struct accesses (obj.foo, obj.bar)
always need a hash lookup (unlike in languages with static
typing). The full metamethod semantics come at a price, too.

* Caching globals and method lookups is difficult. A seemingly
trivial statement like y = math.sqrt(x) needs two hash table
lookups and several type checks and contract verifications to
come to the point where the FP square root instruction (fsqrt)
can be safely inlined. This overhead cannot be avoided without
compromising language semantics (maybe the semantics need to be
augmented). Manually caching often used functions is common
practice in Lua (local sqrt = math.sqrt). But this doesn't work
out so well for obj:method() calls.

* Type checks and other contract verifications are cheap on
modern x86 CPUs. They execute in the integer unit parallel to the
FP intensive main code with out-of-order execution. But the
overhead would be noticeable on embedded CPUs. Many redundant
checks could be removed or hoisted out of loops. Arithmetic
operations could be combined.

* Garbage collection and heap allocation put Lua at a speed
disadvantage to languages with manual memory management. The impact
is less in Lua than other dynamic languages because of typed-value
storage and immutable shared strings. Adding a custom memory
allocator to the Lua core could be beneficial. Complex solutions
like escape analysis are not on my radar for LuaJIT (yet).

Bye,
     Mike
Follow-Ups:
- Re: ANN: LuaJIT 1.1.0, Kein-Hong Man
- Re: ANN: LuaJIT 1.1.0, Gavin Wraith
- Re: ANN: LuaJIT 1.1.0, Framework Studios: Hugo
- Re: ANN: LuaJIT 1.1.0, Zachary P. Landau
References:
- ANN: LuaJIT 1.1.0, Mike Pall
- Re: ANN: LuaJIT 1.1.0, Adam D. Moss
- Re: ANN: LuaJIT 1.1.0, Paul Chiusano
Prev by Date: Re: Using Lua and C with a Garbage Collector
Next by Date: Re: Using Lua and C with a Garbage Collector
Previous by thread: Re: teste Lula
Next by thread: Re: ANN: LuaJIT 1.1.0
Index(es):
- Date
- Thread