Re: Computed goto optimization of vanilla Lua

Some numbers of a "real world" test case:

Lua 5.3.2 - rc2
gcc (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010

-O2 -- default

real    0m35.837s    0m35.561s    0m35.643s
user    0m35.760s    0m35.468s    0m35.552s
sys    0m0.096s    0m0.112s    0m0.104s

-O3

real    0m35.055s    0m34.888s    0m34.852s
user    0m34.932s    0m34.764s    0m34.760s
sys    0m0.140s    0m0.120s    0m0.100s

computed goto -O3 -fno-gcse -fno-crossjumping

real    0m33.984s    0m34.134s    0m34.109s
user    0m33.812s    0m34.028s    0m34.032s
sys    0m0.168s    0m0.128s    0m0.096s

Then, "speedup" of approx 4%

-- Using one.tar.gz [1]

one -O2

real    0m34.728s    0m34.798s    0m34.863s
user    0m34.628s    0m34.708s    0m34.752s
sys    0m0.108s    0m0.104s    0m0.128s

one -O3

real    0m33.656s    0m33.913s    0m33.738s
user    0m33.532s    0m33.824s    0m33.632s
sys    0m0.136s    0m0.096s    0m0.124s

one -O3 -fno-gcse -fno-crossjumping

real    0m33.562s    0m33.430s    0m33.508s
user    0m33.452s    0m33.272s    0m33.368s
sys    0m0.128s    0m0.156s    0m0.140s

one computed goto -O3 -fno-gcse -fno-crossjumping

real    0m32.962s    0m33.114s    0m33.156s
user    0m32.864s    0m33.028s    0m33.036s
sys    0m0.116s    0m0.108s    0m0.140s

one computed goto -O2 -fno-gcse -fno-crossjumping

real    0m34.285s    0m33.833s    0m33.900s
user    0m34.160s    0m33.728s    0m33.780s
sys    0m0.144s    0m0.124s    0m0.120s

one computed goto -O3 -fno-gcse -fno-crossjumping
without the "-DLUA_COMPAT_5_2"

real    0m32.897s    0m32.894s    0m33.316s
user    0m32.816s    0m32.756s    0m33.204s
sys    0m0.100s    0m0.148s    0m0.132s

Then, "speedup" of approx 8%

Conclusion, "one computed goto -O3 -fno-gcse -fno-crossjumping" may represent a real improvement.

[1] http://lua-users.org/lists/lua-l/2015-01/msg00334.html

2016-02-05 5:13 GMT-03:00 Alex Silva <asandroq@gmail.com>:

Hallo,

On 04/02/16 12:45, Roberto Ierusalimschy wrote:
>
> These macros are there exactly for this reason. However, my ealier
> tests (a few years ago, when we introduced the macros) did not show
> any perceptible improvement. In particular, the GCC compiler insisted
> in applying a space optimization that merged the common code at the
> end of different branches in a conditional (most of 'vmbreak', in our
> case), therefore throwing away any possibility of optimized branch
> predictions. (All opcodes ended up using the same indirect jump in the
> final code.) I will try it again.
>

Did you compile the code with '-fno-gcse -fno-crossjumping'?

Cheers,
--
-alex
http://unendli.ch/

Rodrigo Azevedo Moreira da Silva