Re: Computed goto optimization of vanilla Lua

Subject: Re: Computed goto optimization of vanilla Lua
From: Rodrigo Azevedo &lt;rodrigoams@ ... &gt;
Date: Thu, 4 Feb 2016 16:33:58 -0300

Using my research code I got a "speedup" of less than 1% (inconclusive). They are pure Lua codes and memory hungry (many GB of tables creations/destruction) and run for dozens minutes. gcc 5.2.1 with -O2

2016-02-04 15:24 GMT-03:00 Roberto Ierusalimschy <roberto@inf.puc-rio.br>:

> I (re)did some quick tests. For your particular test, I got a "speedup"
> of 5%. In general, I got "speedups" around 5~8%. With clang (3.6), I
> got "speedups" around 2% in all my (few) tests. (The quotes mean that I
> am not sure whether these speedups are real, that is, due only to this
> change and consistent among several compilers, versions, platforms,
> tests, etc.) It would be great if other people could report their
> results for diverse environments and tests.

Well, I still hope to get some feedback.

Anyway, I checked again the code generated by gcc. (This is gcc 4.8.4
with -O2). It sill collapses all code generated by 'vmbreak' into one
single place. All opcodes end with a inconditional jump to this place,
and there it does the only computed goto (jmp *%rax) in the code. So,
it seems impossible to get any gains from branch prediction. Otherwise,
the code with 'switch' also uses a jump table, so the only concrete gain
(other than compiler idiosyncrasies) seems to be the ellimination of a
bound check (which are two instructions (cmpl/ja) with no memory
access and perfect branch prediction).

-- Roberto

Rodrigo Azevedo Moreira da Silva