Re: [PATCH] bastardized lua #1

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: [PATCH] bastardized lua #1
From: askok@...
Date: Tue, 26 Sep 2006 17:23:23 +0300


Yes, there has.

On FPU machines, the performance is usually around +-2% ofa pure FPU solution, so no gain but no (much) penaltyeither.

On non-FPU machines, general speedups are of the magnitude20-50%, or even above (5.94 -> 3.67 = -38,2%; 44.0 -> 7.32= -83,3%!)

Any measurements are subject to the exact patch version,Lua version, and other issues s.a. machine firmware, andruntime libraries. Therefore the tests should best bererun on each target which is considering the chance ofusing the integer patch. Generally, on non-FPU systems, atotal speedup of 30% would be a reasonable expectation.

svn cosvn://slugak.dyndns.org/public/lnum/lnum-100906.patch

Must mention on PowerPatch page, that the recentdevelopments are there.


-asko



--------------------------
   Performance results:
--------------------------

AK 24-Jul-06:

NOTE: THESE PERFORMANCE FIGURES WHERE DONE ON OLDERVERSION OF INT-OPT PATCH.

      Use them as guidance only.

To make these performance tests, run:
        make perftest-float
        make perftest-float-int
        make perftest-double
        make perftest-double-int

The sum of "user" and "sys" values matter. The NSLU2obviously uses "soft float"(run in user mode) while N770 uses "hard float" (run insys mode). That's why in

NSLU2 numbers, the sys parts are rather irrelevant.

The tests are run multiple times, to see out statisticalnoise of the results.

Performance test must always be done with "-O2"optimization (NOT debug info,

or asserts!) and stripped binaries.

The binaries are without readline, or other such extras.


** iMac PowerPC G4 700MHz/512 MB, OS X 10.4.4 **

Sat Jan 21 00:51:29 EET 2006 / Asko Kauppi

Note: These measurements are 'user' values only, shouldbe: 'user'+'sys'

      since both affect.  Sorry. :)

float: 1.07 1.10 1.10 1.09 (1.0900 = 100%)143 792 bytesfloat+int32: 1.11 1.08 1.08 1.08 (1.0875 = 99.8%)147 920 bytes (+4128)float+int64:* 1.177+0.037 =1.214 1.167+0.031 =1.1981.177+0.032= 1.209 (1.207 = 110.7%)double: 1.14 1.12 1.11 1.11 (1.1200 = 102.8%)143 780 bytes (-12)double+int32: 1.14 1.11 1.11 1.11 (1.1175 = 102.5%)147 908 bytes (+4116)double+int64:* 1.18+0.036 =1.216 1.176+0.029 =1.2051.173+0.029 =1.202 (1.208 = 110.8%)

*) measured with 5.1rc4, therefore not completelycomparable with the other figures.



** NSLU2 ARM 266MHz/32 MB, Linux unslung 2.4.22 **

Sat Jan 21 02:34:22 EET 2006 / Asko Kauppi

NSLU2 'time' gives three decimals, but the lowest isalways zero

(and was left out of the statistics here)

float: 5.06+0.14 =5.20 5.11+0.09 =5.20 (5.200 =100%) 126 472 bytesfloat+int: 3.39+0.07 =3.46 3.34+0.12 =3.46 (3.460 =66.5%) 130 020 bytes (+3548)double: 5.85+0.09 =5.94 5.88+0.06 =5.94 (5.940 =114.2%) 127 224 bytes (+752)double+int: 3.58+0.09 =3.67 3.61+0.06 =3.67 (3.670 =70.6%) 130 724 bytes (+4252)

Running test/life.lua and measuring time (with a watch) of200 cycles:(note: do this by first "make perftest-xx" to have thebinaries optimized,

and without debug & assert information)

float:          39.60 sec  (100%)
float+int:       7.08 sec  ( 17.9%)
double:         44.00 sec  (111.1%)
double+int:      7.32 sec  ( 18.5%)


** N770 ARM 200MHz/64 MB, w45/2005 rootstrap **

Mon Jan 23 02:01:54 EET 2006 / Asko Kauppi

With the N770, system part of timings is noticeable fornon-integer variants.

Interestingly, double+int is faster than float+int.

'time lua speed-test.lua 1000' (user+sys):

float: 5.53+3.16 =8.69 5.54+3.10 =8.64 (8.665 =100%) 126 740 bytesfloat+int: 5.53+0.28 =5.81 6.35+0.27 =6.62 (6.215 =76.6%) 129 864 bytes (+3120)double: 5.19+4.05 =9.24 5.03+4.09 =9.12 (9.180 =105.9%) 127 492 bytes (+752)double+int: 5.43+0.26 =5.69 5.41+0.31 =5.72 (5.705 =65.8%) 130 584 bytes (+3844)


'time lua speed-test.lua 10000':

float: real 86.74 user 53.60 sys 33.01 ->user+sys 86.61 (100%)float+int: real 65.61 user 62.31 sys 02.68 ->user+sys 64.99 ( 75.0%)double: real 92.70 user 51.39 sys 40.23 ->user+sys 91.62 (105.8%)double+int: real 57.64 user 54.67 sys 02.67 ->user+sys 57.34 ( 66.2%)



Running test/life.lua, in terminal fullscreen mode.
200 cycles (50, or 100 cycles measured & extrapolated):

float:          85.64 sec  (100%)
float+int:      28.48 sec  ( 33.3%)
double:         92.12 sec  (107.6%)
double+int:     28.64 sec  ( 33.4%)




On Tue, 26 Sep 2006 07:39:28 -0500
 "Thomas Harning Jr." <harningt@gmail.com> wrote:

Has there been any performance checks on the integeroptimization?Could it be faster on FPU machines? I assume it /would/be faster onan intel XSCALE IX425... However I don't think I do muchmath inside
my proxy app.

--
Thomas Harning Jr.

References:
- [PATCH] bastardized lua #1, Karel Tuma
- Re: [PATCH] bastardized lua #1, Thomas Harning Jr.

Prev by Date: Re: {Spam?} Re: Lua scripts made into executable on win32 (smalladdition request)
Next by Date: Re: {Spam?} Re: Lua scripts made into executable on win32 (smalladdition request)
Previous by thread: Re: [PATCH] bastardized lua #1
Next by thread: Re: Extending C++ classes under Lua
Index(es):
- Date
- Thread