[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Microbenching table lookup (Was Re: Pentium 4 and misaligned doubles)
- From: Rici Lake <lua@...>
- Date: Tue, 16 Aug 2005 12:58:28 -0500
On 16-Aug-05, at 12:13 PM, Rici Lake wrote:
I had to specify -msse2 for this to work on gcc 3.3.3
And it did make a bit of a difference, but only in some possibly
unusual cases.
The pattern here is:
local t; local key=$1; local a = {[key] = 3}; for i = 1, 1e7 do local j
= a[key] end
Full results, not converted to cycles (only user time), and without
subtracting the cost of the for loop itself (about 0.12)
normalization patch (note 6)
key no sse sse note no sse sse
1 1.22 1.17 (1) 1.15 1.04
27 1.22 1.19 (2) 1.16 1.03
2^31-1 1.21 1.20 1.15 1.04
2^31 5.67 1.12 (3) 5.64 1.11
1e200 5.65 1.13 5.63 1.11
1/0 9.64 5.19 (4) 5.63 1.13
2^-1022 1.12 1.13 1.07 1.11
2^-1023 4.67 7.50 (5) 1.07 8.23
true 0.52 0.50 0.52 0.50
'a' 0.45 0.45 0.45 0.44
{} 0.83 0.80 0.83 0.80
function()end 0.82 0.81 0.82 0.80
Conclusion: Lua is optimised for string lookups in hash tables and
integer lookups in the array part. That seems appropriate to me. The
penalty for use of an object is noticeable but not outrageous. Floating
point keys are problematic in certain cases, but probably not common
cases. In any event, replacing the floating point normalization with an
explicit check for 0 is probably an improvement.
Notes:
0) The test does not account for hash collisions. Floating point keys
are normalized by adding 1 to the key, so considerable precision will
be lost for very small keys; in particular, all keys in the range
(-2^-52, 2^-52) hash to the same place.
1) The pattern with key=1 does not create a as an array, so the lookup
is in the hash part of a. Contrast:
local t; local key=1; local a = {3}; for i = 1, 1e7 do local j = a[key]
end
0.55 real 0.53 user 0.01 sys
local t; local key=1; local a = {[key] = 3}; for i = 1, 1e7 do local j
= a[key] end
1.22 real 1.22 user 0.00 sys
2) The SSE2 conversion does not seem to be measurably faster in the
case where the conversion does not overflow (but see below).
3) 2^31 (and 1e200) cause integer overflow on double-to-int conversion.
My guess is that the SSE2 instructions avoid a penalty in this case.
4) This is the penalty for compuations with infinity. Apparently, SSE2
is used (if available) for the initial cast to integer, but not for the
normalizing addition of 1 in the case that the hash is computed from
the floating point representation.
5) 2^-1023 is denormalized. This seems to trigger a slow-down with the
SSE2 conversion; at least, that's the only explanation I can come up
with.
6) A considerable amount of the slowdown in unusual cases comes from
the normalization step (n += 1) in hashnum. I replaced this with an
explicit test for 0:
if (n == 0) return hashnode(t, 0);
This will also avoid hash collisions on very small keys. While this
simple-minded benchmark cannot be considered more than an indication,
it seems that this patch improves performance in all cases except
denormalised values with SSE2 conversion.