lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On 16-Aug-05, at 12:13 PM, Rici Lake wrote:

I had to specify -msse2 for this to work on gcc 3.3.3

And it did make a bit of a difference, but only in some possibly unusual cases.

The pattern here is:

local t; local key=$1; local a = {[key] = 3}; for i = 1, 1e7 do local j = a[key] end

Full results, not converted to cycles (only user time), and without subtracting the cost of the for loop itself (about 0.12)

                                    normalization patch (note 6)
key          no sse   sse    note     no sse    sse
1             1.22    1.17    (1)      1.15     1.04
27            1.22    1.19    (2)      1.16     1.03
2^31-1        1.21    1.20             1.15     1.04
2^31          5.67    1.12    (3)      5.64     1.11
1e200         5.65    1.13             5.63     1.11
1/0           9.64    5.19    (4)      5.63     1.13
2^-1022       1.12    1.13             1.07     1.11
2^-1023       4.67    7.50    (5)      1.07     8.23
true          0.52    0.50             0.52     0.50
'a'           0.45    0.45             0.45     0.44
{}            0.83    0.80             0.83     0.80
function()end 0.82    0.81             0.82     0.80

Conclusion: Lua is optimised for string lookups in hash tables and integer lookups in the array part. That seems appropriate to me. The penalty for use of an object is noticeable but not outrageous. Floating point keys are problematic in certain cases, but probably not common cases. In any event, replacing the floating point normalization with an explicit check for 0 is probably an improvement.

Notes:

0) The test does not account for hash collisions. Floating point keys are normalized by adding 1 to the key, so considerable precision will be lost for very small keys; in particular, all keys in the range (-2^-52, 2^-52) hash to the same place.

1) The pattern with key=1 does not create a as an array, so the lookup is in the hash part of a. Contrast:

local t; local key=1; local a = {3}; for i = 1, 1e7 do local j = a[key] end
        0.55 real         0.53 user         0.01 sys
local t; local key=1; local a = {[key] = 3}; for i = 1, 1e7 do local j = a[key] end
        1.22 real         1.22 user         0.00 sys

2) The SSE2 conversion does not seem to be measurably faster in the case where the conversion does not overflow (but see below).

3) 2^31 (and 1e200) cause integer overflow on double-to-int conversion. My guess is that the SSE2 instructions avoid a penalty in this case.

4) This is the penalty for compuations with infinity. Apparently, SSE2 is used (if available) for the initial cast to integer, but not for the normalizing addition of 1 in the case that the hash is computed from the floating point representation.

5) 2^-1023 is denormalized. This seems to trigger a slow-down with the SSE2 conversion; at least, that's the only explanation I can come up with.

6) A considerable amount of the slowdown in unusual cases comes from the normalization step (n += 1) in hashnum. I replaced this with an explicit test for 0:

  if (n == 0) return hashnode(t, 0);

This will also avoid hash collisions on very small keys. While this simple-minded benchmark cannot be considered more than an indication, it seems that this patch improves performance in all cases except denormalised values with SSE2 conversion.