[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: [PATCH] bastardized lua #1
- From: askok@...
- Date: Tue, 26 Sep 2006 17:23:23 +0300
Yes, there has.
On FPU machines, the performance is usually around +-2% of
a pure FPU solution, so no gain but no (much) penalty
either.
On non-FPU machines, general speedups are of the magnitude
20-50%, or even above (5.94 -> 3.67 = -38,2%; 44.0 -> 7.32
= -83,3%!)
Any measurements are subject to the exact patch version,
Lua version, and other issues s.a. machine firmware, and
runtime libraries. Therefore the tests should best be
rerun on each target which is considering the chance of
using the integer patch. Generally, on non-FPU systems, a
total speedup of 30% would be a reasonable expectation.
svn co
svn://slugak.dyndns.org/public/lnum/lnum-100906.patch
Must mention on PowerPatch page, that the recent
developments are there.
-asko
--------------------------
Performance results:
--------------------------
AK 24-Jul-06:
NOTE: THESE PERFORMANCE FIGURES WHERE DONE ON OLDER
VERSION OF INT-OPT PATCH.
Use them as guidance only.
To make these performance tests, run:
make perftest-float
make perftest-float-int
make perftest-double
make perftest-double-int
The sum of "user" and "sys" values matter. The NSLU2
obviously uses "soft float"
(run in user mode) while N770 uses "hard float" (run in
sys mode). That's why in
NSLU2 numbers, the sys parts are rather irrelevant.
The tests are run multiple times, to see out statistical
noise of the results.
Performance test must always be done with "-O2"
optimization (NOT debug info,
or asserts!) and stripped binaries.
The binaries are without readline, or other such extras.
** iMac PowerPC G4 700MHz/512 MB, OS X 10.4.4 **
Sat Jan 21 00:51:29 EET 2006 / Asko Kauppi
Note: These measurements are 'user' values only, should
be: 'user'+'sys'
since both affect. Sorry. :)
float: 1.07 1.10 1.10 1.09 (1.0900 = 100%)
143 792 bytes
float+int32: 1.11 1.08 1.08 1.08 (1.0875 = 99.8%)
147 920 bytes (+4128)
float+int64:* 1.177+0.037 =1.214 1.167+0.031 =1.198
1.177+0.032= 1.209 (1.207 = 110.7%)
double: 1.14 1.12 1.11 1.11 (1.1200 = 102.8%)
143 780 bytes (-12)
double+int32: 1.14 1.11 1.11 1.11 (1.1175 = 102.5%)
147 908 bytes (+4116)
double+int64:* 1.18+0.036 =1.216 1.176+0.029 =1.205
1.173+0.029 =1.202 (1.208 = 110.8%)
*) measured with 5.1rc4, therefore not completely
comparable with the other figures.
** NSLU2 ARM 266MHz/32 MB, Linux unslung 2.4.22 **
Sat Jan 21 02:34:22 EET 2006 / Asko Kauppi
NSLU2 'time' gives three decimals, but the lowest is
always zero
(and was left out of the statistics here)
float: 5.06+0.14 =5.20 5.11+0.09 =5.20 (5.200 =
100%) 126 472 bytes
float+int: 3.39+0.07 =3.46 3.34+0.12 =3.46 (3.460 =
66.5%) 130 020 bytes (+3548)
double: 5.85+0.09 =5.94 5.88+0.06 =5.94 (5.940 =
114.2%) 127 224 bytes (+752)
double+int: 3.58+0.09 =3.67 3.61+0.06 =3.67 (3.670 =
70.6%) 130 724 bytes (+4252)
Running test/life.lua and measuring time (with a watch) of
200 cycles:
(note: do this by first "make perftest-xx" to have the
binaries optimized,
and without debug & assert information)
float: 39.60 sec (100%)
float+int: 7.08 sec ( 17.9%)
double: 44.00 sec (111.1%)
double+int: 7.32 sec ( 18.5%)
** N770 ARM 200MHz/64 MB, w45/2005 rootstrap **
Mon Jan 23 02:01:54 EET 2006 / Asko Kauppi
With the N770, system part of timings is noticeable for
non-integer variants.
Interestingly, double+int is faster than float+int.
'time lua speed-test.lua 1000' (user+sys):
float: 5.53+3.16 =8.69 5.54+3.10 =8.64 (8.665 =
100%) 126 740 bytes
float+int: 5.53+0.28 =5.81 6.35+0.27 =6.62 (6.215 =
76.6%) 129 864 bytes (+3120)
double: 5.19+4.05 =9.24 5.03+4.09 =9.12 (9.180 =
105.9%) 127 492 bytes (+752)
double+int: 5.43+0.26 =5.69 5.41+0.31 =5.72 (5.705 =
65.8%) 130 584 bytes (+3844)
'time lua speed-test.lua 10000':
float: real 86.74 user 53.60 sys 33.01 ->
user+sys 86.61 (100%)
float+int: real 65.61 user 62.31 sys 02.68 ->
user+sys 64.99 ( 75.0%)
double: real 92.70 user 51.39 sys 40.23 ->
user+sys 91.62 (105.8%)
double+int: real 57.64 user 54.67 sys 02.67 ->
user+sys 57.34 ( 66.2%)
Running test/life.lua, in terminal fullscreen mode.
200 cycles (50, or 100 cycles measured & extrapolated):
float: 85.64 sec (100%)
float+int: 28.48 sec ( 33.3%)
double: 92.12 sec (107.6%)
double+int: 28.64 sec ( 33.4%)
On Tue, 26 Sep 2006 07:39:28 -0500
"Thomas Harning Jr." <harningt@gmail.com> wrote:
Has there been any performance checks on the integer
optimization?
Could it be faster on FPU machines? I assume it /would/
be faster on
an intel XSCALE IX425... However I don't think I do much
math inside
my proxy app.
--
Thomas Harning Jr.