lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Geoff Leyland wrote:
> Given that you said
> 
> > This release includes many fixes and performance enhancements,
> > e.g. for recursive code
> 
> what's the story with the binary-trees benchmark?  Is it a GC
> thing, a recursive thing or is it a bad benchmark?

It's mainly a GC benchmark and only incidentally uses recursion.

> -joff makes it a bit slower, and collectgarbage("stop") at the
> top (for instances that fit in memory) make it a bit faster, but
> neither's a huge difference.

Since beta2 didn't compile recursion, it ran in the interpreter.
Now with beta3 it runs in (faster) native code, but the GC
overhead stayed the same. With higher N it spends 75% in the GC
and the memory allocator (and suffers from lots of cache misses).

I could tune the existing GC a bit more, but I guess I'll need to
completely redesign the GC to score well on this benchmark. But
there's one caveat: this benchmark is not necessarily a good
predictor for typical GC performance.

I have many ideas for a GC redesign, but I realize this is a
bigger undertaking, so I'm postponing it until 2.1.

> On the other hand, since the shootout appears to judge on the
> median, it's probably knucleotide you want to make faster.

Actually it's picking fannkuch as the median. And although it
looks really simple, it's the hardest to optimize of them all
(for a trace compiler).

That said, the addition of structured binary data (part of the
work on the FFI) will speed up many of the remaining outliers.
E.g. reverse-complement suffers most from the lack of a mutable
byte-buffer. It would be a trivial program and easy to compile,
if only it could use such a feature.

But I'm not just targeting the current set of shootout benchmarks.
SciMark scores would improve with typed low-level buffers, too.

And since there are no (plain) recursive benchmarks on the shootout
anymore, you can't see that beta3 gave a huge speed boost here
(showing only the x86 results):

$ time lua fib.lua 37
Fib(37): 39088169
7.320
$ time luajit-2.0.0-beta2 fib.lua 37
Fib(37): 39088169
2.044
$ time luajit-2.0.0-beta3 fib.lua 37
Fib(37): 39088169
0.368

Now it's 20x faster than Lua on the dreaded recursive fibonacci
benchmark. Similarly, ack (ackermann function) is 23x faster.
Tail-recursion is now more or less the same speed as plain loops.

--Mike