[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: LuaJIT2 vs. vararg
- From: Vyacheslav Egorov <mister.aleph@...>
- Date: Thu, 26 Nov 2009 18:39:40 +0600
> Well, maybe in Alexander's example, because inlining and unrolling
> at the call site would work out there.
Ah, sorry I misread Mark's example. He is not iterating over varargs at all...
> Unrolling usually only helps if the trip count is low and
> constant. The latter is often not the case with varargs.
I was explicitly reasoning about callsites with _fixed_ number of arguments.
I presume that callsites for vararg functions usually have fixed number of arguments. I do not have any statistics, so I might be wrong.
On Thu, Nov 26, 2009 at 6:07 PM, Mike Pall <firstname.lastname@example.org>
Vyacheslav Egorov wrote:Unrolling usually only helps if the trip count is low and
> If I were developing "static" compiler for the Lua, I would try
> implement the following approach: always try to inline vararg
> functions at callsites with a fixed number of arguments, then try to
> unroll loops that iterate over varargs (apply a simple heuristic to
> find such loops: index variable is in range 1 .. #args, index variable
> is used in args[i]).
> I am curious does it make sense? Can it be reformulated in terms of
> tracing JIT?
constant. The latter is often not the case with varargs.
And unrolling just multiplies the problem with type-variance. Now
you get not just one decision tree, but multiple slightly
There are already some unrolling heuristics in LJ2. The basic idea
> Tracing JIT gets an inline for free. But I don't see how to implement
> on-the-fly loop unrolling for this case.
is to give an outer loop the chance to unroll an inner loop with a
low trip count. See rec_loop_interp() in lj_record.c.
One could additionally penalize loops over varargs to provoke
unrolling. But it's unclear whether this helps the general case.
Well, maybe in Alexander's example, because inlining and unrolling
> (It seems that both Alexander's and Mark's examples will benefit from
> this optimization)
at the call site would work out there.
But Mark's example has an intermediate loop over an array that
cannot/should not be unrolled. This prevents specialization at the
call site of the 'broadcast' abstraction. As I've already said,
this is a bit too much to ask from a compiler.
This is summarized in the "Sufficiently Smart Compiler" argument: