[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LuaJIT with vectors
- From: Mike Pall <mikelu-1011@...>
- Date: Thu, 18 Nov 2010 15:08:48 +0100
David Kastrup wrote:
> Alex Queiroz <asandroq@gmail.com> writes:
> > The standards-compliant switch statement must emit code to check
> > interval range.
>
> Not if the data type is covered exhaustively.
>
> > This code kills the CPU predictor and causes severe pipeline flushing.
>
> I have yet to see an alternative that can predict the opcodes.
Ick ... I'll try to clear up a few misconceptions:
The interval check is fully predictable. Few compilers are able to
eliminate it, even if the data type is covered. it's not covered
for Lua's dispatch, anyway (38 something opcodes). And most
compilers are not smart enough to increase the size of the
dispatch table either.
The indirect, table-based branch to the opcode is the real
problem. It's highly unpredictable since a switch () usually only
adds one dispatch point.
Using (indirect) threaded dispatch (e.g. with the labeled goto
extension of GCC or in assembler) replicates the dispatch. This
makes it much less likely to get branch prediction misses.
But you're all barking up the wrong tree: recent CPUs, e.g. the
Core2 from Intel, have special logic to discover such branches and
avoid pipline flushes in exchange for a dependency stall. So the
effect of threaded dispatch is much less pronounced than it was on
older CPUs.
As you may know, LuaJIT's interpreter is roughly 3x faster than
Lua's interpreter (the JIT compiler plays in another league of
course). This is only in part due to the use of threaded dispatch.
There are many more reasons and they extend to the design of the
whole VM. I wrote a short summary here:
http://www.reddit.com/r/programming/comments/badl2/luajit_2_beta_3_is_out_support_both_x32_x64/c0lrus0
--Mike