lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


(multi-reply follows)

David Given wrote:
> ...incidentally, has anyone asked for an ARM version yet?

There have been various requests for ARM, x64, PPC and MIPS
ports. LuaJIT 2.x will be easier to port, but this is still a
large undertaking. Won't happen before the x86 version is stable.

Asko Kauppi wrote:
> Mike: do you think similar speed benefits (about 5x) as for the x86
> technology would also be reachable on other CPU's?

This is hard to estimate before coding it. It's well known that
interpreter main loops written in C are tough for most embedded
CPUs lacking out-of-order execution, good branch prediction, fast
L1D caches and store-to-load forwarding buffers. OTOH more free
registers on RISC CPUs surely help.

My current guess is that the gain relative to the standard Lua
interpreter will be _higher_ for ARM/MIPS/in-order-PPC than for

Another data point: I've almost completed the new interpreter
core for LuaJIT 2.x. It's written in carefully tuned assembler
code, it uses a more efficient bytecode, it's optimized to reduce
branch prediction misses and cache misses and to improve
scheduling and instruction-level parallelism. First experiments
show it to be sometimes up to 2x-3x faster than the standard Lua
interpreter (written in C). This is just for the interpreter!

To be honest: I didn't expect this. I've always thought the code
generated for the main interpreter loop by C compilers was so-so,
but not that it made that much of a difference. I rewrote it in
assembler primarily because of the need for efficient escapes
from the compiled traces to the interpreter and back, not for the
speed gain (that's what the trace compiler is for).

David Given wrote:
> I'd quite like ARM because I run a couple of ARM devices which are quite 
> slow. The JIT would form a very easy way of improving performance.

Sure, even a (rather low) 2x gain on a low end system might be
much more useful than a 5x gain on a high end system.

About the lua_Number problem on FPU-less CPUs: a mixed FP/int
model has considerable overhead. The sheer number of type and
range checks makes this expensive. Some calculations could be
narrowed down to integers by the compiler, but this is tricky.

I think it's possible to reach near raw integer performance with
fixed point arithmetic (32.31 + one tag bit). And integer
narrowing is far easier, too. Not sure whether fixed point
arithmetic fits the bill for everyone, though (opinions?).

Doug Currie wrote:
> Please include both signed and unsigned field extraction as per my
> plea (rant?) at

I will -- I guess I should think about a better API anyway.