lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Krunal Rao wrote:
> However when vn is small: for vn=10 vsample_rng takes twice the time
> of sample_rng to draw the same number of samples, for vn=3 it takes it
> three times the time and for vn=2 it takes it 200 times the time.
> Is there any reason for this behavior or any way to fix it?

The 200x difference is probably a failure of the region selection
heuristics. I'd need to get a complete example (send directly to
me, not to the mailing list) to analyze this.

But the 2x-3x difference is not unexpected:

LuaJIT compiles a nested loop as two traces: the inner loop is
compiled first, with full loop optimizations (hoisting etc.). Then
the exit from the inner loop and the path around it in the outer
loop back to the inner loop is compiled as a side trace (with less
optimizations). Multiple inner loops are joined with multiple side

Every trace transition has a cost and the side traces are more
costly, since they are less optimized. If your inner loops are
long-running, these effects are not noticeable. But they can
dominate the performance for inner loops with a low iteration
count. There are other issues, such as a higher number of branch
prediction misses etc.

Actually this isn't specific to LuaJIT. It's just more noticeable
in a dynamic language and with a trace compiler. Try a simple C
program, such as this (needs GCC):

#include <stdlib.h>

int main(int argc, char **argv)
  int i, j, n = atoi(argv[1]), m = atoi(argv[2]);
  for (i = 0; i < n; i++)
    for (j = 0; j < m; j++)
      __asm__ __volatile__("");
  return 0;

$ time ./x 10 500000000
$ time ./x 500000000 10