[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Potential performance improvements for Lua 5.2 beta
- From: Joshua Jensen <josh.jjensen@...>
- Date: Fri, 05 Aug 2011 07:44:00 -0600
----- Original Message -----
From: Roberto Ierusalimschy
Date: 8/4/2011 12:00 PM
First, on my Core i7 720 laptop, turning off #define MS_ASMTRICK
saves a bit of time.
Does the code use the "regular" IEEE trick in that case
(LUA_IEEE754TRICK)? That used to cause problems with DirectX. Does
anyone know how is this problem currently?
No LUA_IEEE754TRICK, although that is very useful.
Okay, with those out of the way, I hit lvm.c. For my particular
benchmarks, the VM executes luaV_lessthan() and luaV_lessequal() a
lot. Adopting the Lua 5.1 type equality check helps out
considerably.
[...]
Do you have any explanation to that? (Do the benchmarks execute
a lot of luaV_lessthan/luaV_lessequal for numbers or strings?)
I don't have any explanation. I have the assembly code in front of me
for both. Lua 5.2's implementation looks more efficient, but in one of
my benchmarks, I gained roughly 0.4 seconds with the Lua 5.1
implementation.
I then used an Instruction *pc and tried to update ci->u.l.savedpc
at key intervals. I did this quickly. It might be wrong, but the
performance improvement was huge.
How much is "huge"?
Ah, I forgot to post numbers.
Here are some timings I quickly through together for one of the
benchmarks (attached below). It does a prime number calculation.
Unlike the original mail, I left the LUA_NANTRICKLE #define on, since I
wasn't doing a head to head comparison against Lua 5.1.
Lua 5.2 beta: 28.14 seconds
LUA_IEEE754TRICK: 27.64 seconds (was curious... and I leave it on for
the rest of these numbers)
PC #1: 27.23 seconds (this is the Instruction**
pc version)
PC #2: 25.89 seconds (this is the Instruction*
pc version which may not be correct for everything but is for this
benchmark)
"Huge" is subjective, but the Instruction* version bought nearly 2 seconds.
Picking through the assembly shows the reason why. Not only is there
one less dereference for access to pc, but the runtime keeps 'pc' loaded
in a register. In Lua 5.2 beta, there are a number of additional
instructions for storing something back into a memory space and then
loading, say, 'L' into a register. While I can't say for certain, it
appears as if 'L' remains loaded in a register in the Instruction* version.
Josh
local primes = {}
primes[1] = 2
primes[2] = 3
local nprimes = 2
local function try( n )
local i = 1
while true do
local prime_i = primes[i]
if prime_i * prime_i >= n then break end
if ( n % prime_i ) == 0 then
return;
end
i = i + 1
end
primes[ nprimes ] = n
nprimes = nprimes + 1
end
function main()
for iter=1,100 do
primes[1] = 2
primes[2] = 3
nprimes = 2
local i = 1
while nprimes < 25000 do
local i6 = i * 6
try( i6 - 1 )
try( i6 + 1 )
i = i + 1
end
print('--------->', collectgarbage('count'))
end
end
collectgarbage()
print('--------->', collectgarbage('count'))
main()