lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

On Tuesday 02, Mike Pall wrote:
> Robert G. Jakabosky wrote:
> > One way llvm-lua could improve the speed of static compiled code would be
> > to add support for recording hints during runtime and writting those
> > hints to a file that would then be passed to the static compiler.
> Sure, feedback-directed optimization (FDO) would probably help a
> lot. But static compilers face some inherent difficulties when
> compiling a dynamic language:
> - Dynamic languages generate tons of code paths, even for simple
>   operations. A static compiler has to compile *all* code paths.
>   Apart from the overhead, this pessimizes the fast code paths,
>   too (esp. PRE and register allocation). A JIT compiler only
>   generates code for paths that are actually executed (this may
>   also differ between runs).
> - Even with FDO, static compilers have to be very conservative
>   about inlining and specialization to limit code growth. A JIT
>   compiler can be more aggressive, because it has to deal with a
>   much lower number of code paths.
> - Not even the best C compilers are able to hoist complex control
>   structures out of loops. Once you lower (say) a load from a hash
>   table (a loop following the hash chain) to the LLVM IR, you lose
>   the original semantics. This usually means you cannot optimize
>   it or eliminate it, either.

A JIT compiler will still need hooks to catch the slow paths just in case the 
function is called with a different type for one of it's parameters or 
something else in the environment changes.  The JIT will be able to just 
re-compile the code to allow the new path.

With a static compiler it will not be able to do a on-the-fly re-compile, but 
I think it will be possible to provide a slower fall-back copy of the 
function.  This will require a duplicate of every function, but I think in a 
lot of cases the extra slow version will not need to be used.

> That's not to discourage you from your work -- just to point out
> some difficulties you'll be facing with llvm-lua.

I agree that it will not be easy, but I don't shy away from difficult tasks, 
they are just more interesting puzzles.  I just wished I had more time to 
work on this project.

> The following trivial Lua function is a nice optimization challenge:
>   function f(x, n)
>     for i=1,n do x = math.abs(x) end
>     return x
>   end
> The inner loop has two hash table loads (_G["math"], math["abs"]),
> a function dispatch and a call to a standard library function.
> Here's the corresponding checklist for a Lua compiler in
> increasing order of complexity:
> LJ1  LJ2
> yes  yes  Type-specializes the hash table loads (object and key type).
> yes* yes  Inlines both hash table loads (* partially inlined by LJ1).
> no   yes  Specializes the loads to the hash slot (omit chain loop).
> yes  yes  Specializes to a monomorphic function dispatch for abs().
> yes  yes  Inlines the abs() function.
> no   yes  Hoists both hash table loads out of the loop.
> no   yes  Hoists the abs() out of the loop (abs(abs(x)) ==> abs(x)).
> no   yes  Emits an empty loop (could be optimized away, too).
> Now try to get 8x yes with a static compiler, while preserving the
> full Lua semantics (e.g. somebody might do math.abs = math.sin or
> _G might have a metatable etc.). Good luck! :-)

Yes I really hate it when somebody messes with the math.* functions ;-)

I have though about providing a Lua restricted mode that dis-allows some 
changes to _G and the standard libraries.  The restricted mode would allow 
static compilers to inline more code without needing some of the slow paths.  
As long as the script doesn't need to use some weird tricks like that it 
should be able to run under the restricted mode (might also be usefull to the 
sand-boxing people).

> > Below are the results of running scimark-2008-01-22.lua [1] for a
> > performance comparison.
> >
> > # taskset -c 1 /usr/bin/lua scimark-2008-01-22.lua large
> The large dataset is an out-of-cache problem and cache thrashing
> (esp. for FFT) dominates the timings. IMHO the small dataset is
> better suited to compare the performance of the different VMs.

Attached is the run of the small dataset.  I had ran both the small and large 
datasets, so that attached log is from the same time as the large dataset 
results.  I will try to remember to use the small dataset next time.

> Ok, so LJ2 is getting closer to C's performance. It's already on
> par with C for simpler benchmarks (like mandelbrot or spectralnorm).
> But there still remains a lot to do ...

Very nice.  It is good to know that it is possible to get that much 
performance out of Lua code.  I don't except to reach that performance level 
with llvm-lua for a long time, if it is even possible.  I will evaluate 
LuaJIT 2 when it is released to see if it can be used to fit my requirements, 
but until then I will be using llvm-lua.

Robert G. Jakabosky
# taskset -c 1 /usr/bin/lua scimark-2008-01-22.lua
Lua SciMark 2008-01-22, based on SciMark 2.0a. Copyright (C) 2008 Mike Pall.

FFT         6.34  [1024]
SOR        16.23  [100]
MC          3.98  
SPARSE     10.14  [1000, 5000]
LU         11.56  [100]

SciMark     9.65  [small problem sizes]

# taskset -c 1 llvm-lua scimark-2008-01-22.lua
Lua SciMark 2008-01-22, based on SciMark 2.0a. Copyright (C) 2008 Mike Pall.

FFT        17.75  [1024]
SOR        41.67  [100]
MC          9.87  
SPARSE     23.41  [1000, 5000]
LU         27.53  [100]

SciMark    24.05  [small problem sizes]

# taskset -c 1 ./scimark-2008-01-22
Lua SciMark 2008-01-22, based on SciMark 2.0a. Copyright (C) 2008 Mike Pall.

FFT        18.11  [1024]
SOR        39.75  [100]
MC          9.29  
SPARSE     22.26  [1000, 5000]
LU         26.10  [100]

SciMark    23.10  [small problem sizes]

# taskset -c 1 luajit -O3 scimark-2008-01-22.lua
Lua SciMark 2008-01-22, based on SciMark 2.0a. Copyright (C) 2008 Mike Pall.

FFT        28.92  [1024]
SOR        69.62  [100]
MC         22.15  
SPARSE     40.16  [1000, 5000]
LU         51.56  [100]

SciMark    42.48  [small problem sizes]