lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi,

David Given wrote:
> I'm not sure a Lua-to-C compiler would help --- Lua's so dynamic that you 
> can't actually convert a Lua expression into a single machine code routine. 
> Even something as simple as "a + b" could do *anything*, depending on what 
> types a and b were, what their metatables were defined as, the enclosing 
> scopes, etc.

But you can take a good guess what the most common types for
a specific opcode are. This allows you to inline the common
case, provided you first check for the expected type (and use
a fallback handler if that fails).

E.g. here is 'x=x+2' (OP_ADD with 1 variable and 1 constant):

080731A0  cmp dword [ebx+0x8],byte +0x3   // LUA_TNUMBER in stack slot #0?
080731A4  jnz 0x80731ba                   // No, use fallback code
080731A6  fld qword [0x8074450]           // Operand #2: FP constant '2'
080731AC  fadd qword [ebx]                // Operand #1: stack slot #0
080731AE  fstp qword [ebx]                // Destination: stack slot #0
080731B0  ...

// POST code segment:
080731BA  lea ecx,[ebx]                   // Operand #1
080731BC  mov edx,0x8074450               // Operand #2
                                          // Destination: BASE += 0 omitted
080731C1  mov eax,0x5                     // TM_ADD
080731C6  mov dword [esi+0x1c],0x806e9c8  // L->ctx = &pt->code[pc+1]
080731CD  call 0x806e928                  // Call fallback handler
080731D2  jmp short 0x80731b0             // Continue with next opcode

That's only 5 (five!) instructions in the fast path. And this
is even without data flow analysis, which might allow you to
remove the check (2 instructions less). But it probably doesn't
pay off here since the forward branch is statically predicted as
'not-taken' and the FP pipeline hides some of the latency.

Similar things can be done for x..y (probably strings) or *x
(probably a table or a string) and so on ...

And (most important) you can inline the whole hashtable lookup
for t.foo (aka t["foo"]) in a few instructions ("foo" is a constant
and you know both the hash value and the TString pointer at
compile time). This makes table-based dispatch competitive with
static method dispatch (like in compiled C++).

> All naive compilation would gain is saving the overhead of doing 
> the instruction decode. (Which may still be worthwhile, of course --- how 
> much overhead is there?)

Instruction decoding is only one part of the overhead for a
bytecode interpreter. The other major problem is the unpredictable
branch for the opcode dispatch (C switch statement).

Here's the timing on a code snippet which alternates between
two opcodes (LOADBOOL and FORLOOP):

$ TIMEFORMAT="%U"
$ time lua -e 'local x; for i=1,1e8 do x = true end'
6.106
$ time luajit -e 'local x; for i=1,1e8 do x = true end'
0.893

The machine code is almost 7 times faster. So yes, it _does_ pay off.

> However, a dynamically recompiling JIT like Python's Psyco would
> probably make Lua fly (even more). Unfortunately, these require
> deep knowledge of the black arts to make work effectively...

Python has a plethora of different types. And almost all of
them can be used with any opcode. It's a lot more difficult
to statically predict the types for (say) a[b]. A dynamically
specializing compiler (such as Psyco) is the only realistic
approach for Python.

With Lua one can do far better with far less effort. The reduced
type variability pays off here, too.


Ok, so you read past all this, so here is the hidden message:

I'm working on LuaJIT, but didn't want to announce it before
I was ready to release the code. I've gotten pretty far, but
there's still a lot to do. Initial results are very promising
(see above). ETA for the first release is 8/2005. Please be
patient. Thank you.

Bye,
     Mike