[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: LuaJIT2 performance for number crunching
- From: CrazyButcher <crazybutcher@...>
- Date: Sun, 13 Feb 2011 19:18:52 +0100
2011/2/13 Francesco Abbate <email@example.com>:
> Well, for me there is some confusion here. It is true that you can use
> templates with the dimension of the system as a parameter to let the
> C++ compiler more aggressive optimization but if you use just plain C
> loops with a non-constant bound the code will be still optimized and
> the difference of performance will be a minor one. I mean, optimizing
> C/C++ compiler always optimize *all* the code.
This sounds wrong to me. There is a huge difference between unrolling
small loops and using actual runtime checks in C/C++ as well.
Especially if you do this for many small ops. It's really embarrassing
what difference that will make.
> The problem here is
> that LuaJIT2 will only optimize the innermost loops and do almost
> nothing to optimize the code in the outer loop.
I am not sure what you mean here. The jit will have optimized paths
for the various codepaths it recognizes and thinks its worth
have you read through this?
> I can also imagine a case where the dimension of
> the problem is quite big, let say 100, what about generating ~ 600
> local variables and unroll a huge loop to let LuaJIT optimize it ?
you do realize that at this magnitude of the loop would become "hot"
again and no unroll required. Just like your c compiler would
sometimes unroll code and sometimes leave the loop in, depending on