[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LuaJIT 2 performance page
- From: Mike Pall <mikelu-1012@...>
- Date: Tue, 21 Dec 2010 11:23:49 +0100
Geoff Leyland wrote:
> On 21/12/2010, at 2:10 PM, Mike Pall wrote:
> > SciMark performance (-large and -small) for LuaJIT git HEAD:
>
> So am I right in thinking these show that Java > C ~= FFI > noFFI?
> (That's a "roughly equal to").
Yes, at least for SciMark -large. LuaJIT heavily optimizes inner
loops and generates near optimal code for them. Outer loops
receive much less attention right now and it shows. SciMark -large
spends much more time in the inner loops, so LuaJIT beats GCC by a
couple percent, but loses by 10-15% on the small data set.
LuaJIT knows how to reassociate index expressions across loop
iterations, GCC 4.4 does not. JVM uses unrolling to solve that and
is also better at turning conditionals into predicated code. GCC
is better at eliminating 32-to-64 bit sign-extensions for indexing
than LuaJIT/x64. That about sums up the differences.
None of the code is auto-vectorized by any of the compilers. ICC
would probably do that, but it's such a hassle to install.
> Is the C version as well implemented as the java implementation?
Yes, I think so. The SciMark kernels are so small, there isn't
much cross-language variation you'd need to take into account.
> I'll have to rewrite the raytracer for FFI.
You're welcome! But performance will be disappointing right now,
because explicit allocations are neither traced (maybe tomorrow)
nor sunk (maybe next week).
Performance for "double[3]" vs. "struct { double x,y,z; }" should
be the same. Code for the former may be more easily transformed to
use low-level vectors later on.
[The FFI already has vector data types with valarray semantics,
but it doesn't compile them to real vector operations, yet.]
> Any news on scalar replacement of aggregates?
I've improved alias analysis, so most operations are already
performed on scalars. But stores and allocations are not sunk yet.
I'll need to add generalized store + allocation sinking soon. E.g.
performance for complex numbers is really bad without this.
[Store forwarding + Store sinking + Allocation sinking = SRA]
--Mike