lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Speaking of ver. 2, what's the status, if any, on 64 bit support?

I've said this before, open a paypal account, I'll gladly fork over some cash for you to get a 64bit cpu to work with,
and I'm sure others will too..

Mike Pall wrote:
David Given wrote:
Is it still necessary to cache globals in local variables for decent
performance, or can I expect the JIT to optimise this for me?

For example:

i = 0
while (i < 10000) do i = i + 1 end


local i = 0
while (i < 10000) do i = i + 1 end

I wouldn't call this a typical example of caching.  Using local
variables for loop counters is just common sense. :-)

But the basic idea is of course the same as in this more typical

local s = 0; for i=1,1000 do s = s + math.sin(i) end
local sin = math.sin; local s = 0; for i=1,1000 do s = s + sin(i) end

LuaJIT 1.x makes a very literal translation of the bytecode to
machine code. So you still need to cache table lookups for optimal
performance of inner loops.

LuaJIT 2.x (no, it's not ready for release) knows how to hoist
table lookups of all kinds out of loops. So the above examples
with math.sin have the same performance.

And to show the general case, the following four loops have
identical performance, too:

local x=1;     for i=1,n do x=x+i end        -- local
x=1;           for i=1,n do x=x+i end        -- global (hash lookup)
local t={1};   for i=1,n do t[1]=t[1]+i end  -- array lookup
local t={x=1}; for i=1,n do t.x=t.x+i end    -- member field (hash lookup)

Of course this just another silly example where a local would be
the most logical choice. But these cases can arise in practice in
OO-centric programs (self.count = self.count + 1) and as a
side-effect of heavy inlining.

Vyacheslav Egorov wrote:
I do not think that LuaJIT violates Lua semantics. It is not possible
to determine live range of the given global in the presence of
functional environments, metatables, dynamic code loading.

It's possible if the complex bytecode instructions are broken up
into smaller operations. The combination of store-to-load
forwarding (in linear control flow and across loop iterations) and
store sinking can remove almost all redundant aggregate access.

This is the same effect that traditional static compilers can
achieve with scalar replacement of aggregates. And since alias
analysis is much easier in Lua (no pointers) even a simplistic
disambiguation algorithm can be far more effective.

Metatables and other dynamisms pose no problem either. A direct or
indirect metatable access is just a special load or store to the
underlying object. If the alias analysis doesn't find a conflicting
store inside the loop (setmetatable to the same object), it can
safely hoist all metatable loads out of the loop. Even if it finds
a store, it can often just forward it to the load and sink the
store. Ditto for function environments.

Dynamic code loading can introduce new closures, but the identity
of all closures is checked anyway. If a check later fails, the
trace will be extended with a side branch to handle the new

E.g. here's the SSA IR generated by LuaJIT 2.x for the uncached
math.sin loop:

0004  I Knum KNUM   1
0005  I Knum KNUM   1000
0006 >P  num SLOAD  #2            i
0007  I Kstr KSTR   "math"
0008  I Kfun KFUNC  0x8072c88
0009 >I  fun SLOAD  #0            (current closure)
0010 >I  fun EQ     0009 0008
0011  I  tab FLOAD  0008 cl_env   getfenv()
0012 >I Kstr HREF   0011 0007
0013 >I  tab HLOAD  0012          math
0014  I Kstr KSTR   "sin"
0015 >I Kstr HREF   0013 0014
0016 >I  fun HLOAD  0015          math.sin
0017  I Kfun KFUNC  0x80714a0
0018 >I  fun EQ     0016 0017     check for math.sin closure identity
0019     num SIN    0006
0020 >P  num SLOAD  #1            s
0021     num ADD    0019 0020
0022     num ADD    0006 0004
0023 >   num LE     0022 0005
0024     num PHI    0006 0022     i
0025     num PHI    0020 0021     s

1st column: IR instruction number (implicit SSA ref)
2nd column: I: invariant, P: left PHI operand, >: guard.
3rd column: K: constant, num/str/fun/tab/...: typed IR
4th column: IR operation
5th/6th column: IR operands (SSA refs or literals)

As you can see almost all operations are invariant and will be
hoisted out of the loop. Scalar stores are implicit and not
represented in the IR because all of them can be sunk. The only
remaining operations inside the loop are SIN, two ADDs and a LE
comparison. The only guard left inside the loop is the loop exit
condition. This is the minimum number of operations -- a C compiler
couldn't generate any better code.

[ And for the specialists: no, there's no metatable lookup for the
function environment or for the math table because the type checks
implied by the loads suffice. If a table value is not nil, neither
an __index nor a __newindex metamethod can ever be called. ]