Microbenching table lookup (Was Re: Pentium 4 and misaligned doubles)

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Microbenching table lookup (Was Re: Pentium 4 and misaligned doubles)
From: Rici Lake <lua@...>
Date: Tue, 16 Aug 2005 12:58:28 -0500


On 16-Aug-05, at 12:13 PM, Rici Lake wrote:

I had to specify -msse2 for this to work on gcc 3.3.3

And it did make a bit of a difference, but only in some possiblyunusual cases.


The pattern here is:

local t; local key=$1; local a = {[key] = 3}; for i = 1, 1e7 do local j= a[key] end

Full results, not converted to cycles (only user time), and withoutsubtracting the cost of the for loop itself (about 0.12)


                                    normalization patch (note 6)
key          no sse   sse    note     no sse    sse
1             1.22    1.17    (1)      1.15     1.04
27            1.22    1.19    (2)      1.16     1.03
2^31-1        1.21    1.20             1.15     1.04
2^31          5.67    1.12    (3)      5.64     1.11
1e200         5.65    1.13             5.63     1.11
1/0           9.64    5.19    (4)      5.63     1.13
2^-1022       1.12    1.13             1.07     1.11
2^-1023       4.67    7.50    (5)      1.07     8.23
true          0.52    0.50             0.52     0.50
'a'           0.45    0.45             0.45     0.44
{}            0.83    0.80             0.83     0.80
function()end 0.82    0.81             0.82     0.80

Conclusion: Lua is optimised for string lookups in hash tables andinteger lookups in the array part. That seems appropriate to me. Thepenalty for use of an object is noticeable but not outrageous. Floatingpoint keys are problematic in certain cases, but probably not commoncases. In any event, replacing the floating point normalization with anexplicit check for 0 is probably an improvement.


Notes:

0) The test does not account for hash collisions. Floating point keysare normalized by adding 1 to the key, so considerable precision willbe lost for very small keys; in particular, all keys in the range(-2^-52, 2^-52) hash to the same place.

1) The pattern with key=1 does not create a as an array, so the lookupis in the hash part of a. Contrast:

local t; local key=1; local a = {3}; for i = 1, 1e7 do local j = a[key]end

        0.55 real         0.53 user         0.01 sys

local t; local key=1; local a = {[key] = 3}; for i = 1, 1e7 do local j= a[key] end

        1.22 real         1.22 user         0.00 sys

2) The SSE2 conversion does not seem to be measurably faster in thecase where the conversion does not overflow (but see below).

3) 2^31 (and 1e200) cause integer overflow on double-to-int conversion.My guess is that the SSE2 instructions avoid a penalty in this case.

4) This is the penalty for compuations with infinity. Apparently, SSE2is used (if available) for the initial cast to integer, but not for thenormalizing addition of 1 in the case that the hash is computed fromthe floating point representation.

5) 2^-1023 is denormalized. This seems to trigger a slow-down with theSSE2 conversion; at least, that's the only explanation I can come upwith.

6) A considerable amount of the slowdown in unusual cases comes fromthe normalization step (n += 1) in hashnum. I replaced this with anexplicit test for 0:


  if (n == 0) return hashnode(t, 0);

This will also avoid hash collisions on very small keys. While thissimple-minded benchmark cannot be considered more than an indication,it seems that this patch improves performance in all cases exceptdenormalised values with SSE2 conversion.

References:
- Pentium 4 and misaligned doubles, Rici Lake
- Re: Pentium 4 and misaligned doubles, Mike Pall
- Re: Pentium 4 and misaligned doubles, Rici Lake

Prev by Date: Re: Pentium 4 and misaligned doubles
Next by Date: Re: Packaging and importing
Previous by thread: Re: Pentium 4 and misaligned doubles
Next by thread: Re: Pentium 4 and misaligned doubles
Index(es):
- Date
- Thread