• Subject: Re: Puzzled by simple test with Lua 5.3
• From: Tim Hill <drtimhill@...>
• Date: Sun, 13 Jul 2014 11:10:33 -0700

On Jul 13, 2014, at 6:53 AM, Geoff Smith <spammealot1@live.co.uk> wrote:

The results below show that a default 64 bit integer for loop variable is 20% or so slower than using a double, I wondered how much of that is due to the Lua53 implementation or is it inherent in the CPU silicon ?

A quick test to try and answer that was to write a few different for loops in raw C code

That gave me approx.

C  code int32 loop variable   too fast to measure sub 1 mS
C  code int64 loop variable  9.9 Secs to execute
C  code double loop variable  19.7 Secs to execute
C  code float loop variable  53 Secs to execute

In raw C an int64 is less than half the execution time of a loop using a double.  So this might indicate there is room for LHF and Roberto to optimise Lua53 some more so we don't see the performance hit on a fast PC caused by 64 bit integers ?

Another surprising number was how bad it is to use a float as the loop variable, I guess this must be because the CPU's floating point silicon is designed and optimised for double floating point numbers.

 Lua 5.3 Default build (Double and 64 bit integers) Test Num Result Time (Secs) Comment Test 1 15.251 Test 2 11.942 Using Doubles is 21.7% faster Test 3 7.476 Test 4 6.137 Using Doubles is 17.9% faster Lua 5.3 Modified build (Double and 32 bit integers) Test Num Result Time (Secs) Comment Test 1 10.213 Test 2 12.122 Using Doubles is 18.7% slower Test 3 5.564 Test 4 6.268 Using Doubles is 12.7% slower Notes Test 1 loops are integers  i=0, 10000 do Test 2 loops are non integers  i=0.1, 10000.1 do Test 3 loops are integers  with empty loop body Test 4 loops are non integers with empty loop body

Interesting numbers. A couple of comments…

Benchmarking C code is tricky even for simple “for" loops. Using non-optimized code (-O0) will of course not give a representative result, but using something like -O2 (or the VC equivalent) will often result in the compiler optimizing the loop out of existence (literally). You should generate assembly language and inspect it to see what the compiler is actually generating.

The use of 64-bit integers on a 32-bit target can also vary widely depending on the compiler, CPU, and optimization setting. Most 32-bit instruction sets do have some support for 64-bit operations, but of course their use depends on the compiler. GCC and clang both make good use of the 64-bit instructions even when compiling for 32-bit *IF* the compiler is told that they are available (otherwise it must compile for a generic x86), and this can make a huge difference.

The C code float vs double times is curious. I suspect this is the result of the extra casts between single and double inserted by the compiler.

—Tim