lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



Yes, there has.

On FPU machines, the performance is usually around +-2% of a pure FPU solution, so no gain but no (much) penalty either.

On non-FPU machines, general speedups are of the magnitude 20-50%, or even above (5.94 -> 3.67 = -38,2%; 44.0 -> 7.32 = -83,3%!)

Any measurements are subject to the exact patch version, Lua version, and other issues s.a. machine firmware, and runtime libraries. Therefore the tests should best be rerun on each target which is considering the chance of using the integer patch. Generally, on non-FPU systems, a total speedup of 30% would be a reasonable expectation.

svn co svn://slugak.dyndns.org/public/lnum/lnum-100906.patch

Must mention on PowerPatch page, that the recent developments are there.

-asko



--------------------------
   Performance results:
--------------------------

AK 24-Jul-06:
NOTE: THESE PERFORMANCE FIGURES WHERE DONE ON OLDER VERSION OF INT-OPT PATCH.
      Use them as guidance only.

To make these performance tests, run:
        make perftest-float
        make perftest-float-int
        make perftest-double
        make perftest-double-int

The sum of "user" and "sys" values matter. The NSLU2 obviously uses "soft float" (run in user mode) while N770 uses "hard float" (run in sys mode). That's why in
NSLU2 numbers, the sys parts are rather irrelevant.

The tests are run multiple times, to see out statistical noise of the results.

Performance test must always be done with "-O2" optimization (NOT debug info,
or asserts!) and stripped binaries.

The binaries are without readline, or other such extras.


** iMac PowerPC G4 700MHz/512 MB, OS X 10.4.4 **

Sat Jan 21 00:51:29 EET 2006 / Asko Kauppi

Note: These measurements are 'user' values only, should be: 'user'+'sys'
      since both affect.  Sorry. :)

float: 1.07 1.10 1.10 1.09 (1.0900 = 100%) 143 792 bytes float+int32: 1.11 1.08 1.08 1.08 (1.0875 = 99.8%) 147 920 bytes (+4128) float+int64:* 1.177+0.037 =1.214 1.167+0.031 =1.198 1.177+0.032= 1.209 (1.207 = 110.7%) double: 1.14 1.12 1.11 1.11 (1.1200 = 102.8%) 143 780 bytes (-12) double+int32: 1.14 1.11 1.11 1.11 (1.1175 = 102.5%) 147 908 bytes (+4116) double+int64:* 1.18+0.036 =1.216 1.176+0.029 =1.205 1.173+0.029 =1.202 (1.208 = 110.8%)

*) measured with 5.1rc4, therefore not completely comparable with the other figures.


** NSLU2 ARM 266MHz/32 MB, Linux unslung 2.4.22 **

Sat Jan 21 02:34:22 EET 2006 / Asko Kauppi

NSLU2 'time' gives three decimals, but the lowest is always zero
(and was left out of the statistics here)

float: 5.06+0.14 =5.20 5.11+0.09 =5.20 (5.200 = 100%) 126 472 bytes float+int: 3.39+0.07 =3.46 3.34+0.12 =3.46 (3.460 = 66.5%) 130 020 bytes (+3548) double: 5.85+0.09 =5.94 5.88+0.06 =5.94 (5.940 = 114.2%) 127 224 bytes (+752) double+int: 3.58+0.09 =3.67 3.61+0.06 =3.67 (3.670 = 70.6%) 130 724 bytes (+4252)

Running test/life.lua and measuring time (with a watch) of 200 cycles: (note: do this by first "make perftest-xx" to have the binaries optimized,
and without debug & assert information)

float:          39.60 sec  (100%)
float+int:       7.08 sec  ( 17.9%)
double:         44.00 sec  (111.1%)
double+int:      7.32 sec  ( 18.5%)


** N770 ARM 200MHz/64 MB, w45/2005 rootstrap **

Mon Jan 23 02:01:54 EET 2006 / Asko Kauppi

With the N770, system part of timings is noticeable for non-integer variants.
Interestingly, double+int is faster than float+int.

'time lua speed-test.lua 1000' (user+sys):
float: 5.53+3.16 =8.69 5.54+3.10 =8.64 (8.665 = 100%) 126 740 bytes float+int: 5.53+0.28 =5.81 6.35+0.27 =6.62 (6.215 = 76.6%) 129 864 bytes (+3120) double: 5.19+4.05 =9.24 5.03+4.09 =9.12 (9.180 = 105.9%) 127 492 bytes (+752) double+int: 5.43+0.26 =5.69 5.41+0.31 =5.72 (5.705 = 65.8%) 130 584 bytes (+3844)

'time lua speed-test.lua 10000':
float: real 86.74 user 53.60 sys 33.01 -> user+sys 86.61 (100%) float+int: real 65.61 user 62.31 sys 02.68 -> user+sys 64.99 ( 75.0%) double: real 92.70 user 51.39 sys 40.23 -> user+sys 91.62 (105.8%) double+int: real 57.64 user 54.67 sys 02.67 -> user+sys 57.34 ( 66.2%)


Running test/life.lua, in terminal fullscreen mode.
200 cycles (50, or 100 cycles measured & extrapolated):

float:          85.64 sec  (100%)
float+int:      28.48 sec  ( 33.3%)
double:         92.12 sec  (107.6%)
double+int:     28.64 sec  ( 33.4%)




On Tue, 26 Sep 2006 07:39:28 -0500
 "Thomas Harning Jr." <harningt@gmail.com> wrote:
Has there been any performance checks on the integer optimization? Could it be faster on FPU machines? I assume it /would/ be faster on an intel XSCALE IX425... However I don't think I do much math inside
my proxy app.

--
Thomas Harning Jr.