|
> I (re)did some quick tests. For your particular test, I got a "speedup"
> of 5%. In general, I got "speedups" around 5~8%. With clang (3.6), I
> got "speedups" around 2% in all my (few) tests. (The quotes mean that I
> am not sure whether these speedups are real, that is, due only to this
> change and consistent among several compilers, versions, platforms,
> tests, etc.) It would be great if other people could report their
> results for diverse environments and tests.
Well, I still hope to get some feedback.
Anyway, I checked again the code generated by gcc. (This is gcc 4.8.4
with -O2). It sill collapses all code generated by 'vmbreak' into one
single place. All opcodes end with a inconditional jump to this place,
and there it does the only computed goto (jmp *%rax) in the code. So,
it seems impossible to get any gains from branch prediction. Otherwise,
the code with 'switch' also uses a jump table, so the only concrete gain
(other than compiler idiosyncrasies) seems to be the ellimination of a
bound check (which are two instructions (cmpl/ja) with no memory
access and perfect branch prediction).
-- Roberto