[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Hash Table Collisions (n.runs-SA-2011.004)
- From: William Ahern <william@...>
- Date: Tue, 3 Jan 2012 22:43:07 -0800
On Wed, Jan 04, 2012 at 12:17:58PM +0900, Miles Bader wrote:
> > I believe Lua needs to do the same. If it does not, it risks being
> > deemed not suitable for writing software for the web. I don't know
> > about you, but I for one would not like to see that happen.
>
> An "all characters" hash is not free, especially for _very_ long
> strings (I regularly handle 500MB strings), and the benefit of such a
> change needs to be weighed against the costs, weighted by the
> likelihood of each case. [This is especially true on less performant
> hardware, and I think Lua is much more frequently used on such
> platforms compared to more bloated/weighty languages like Python,
> Ruby, etc.]
For strings that large memory bandwidth constraints provide an opportunity
to close the gap. I got within a factor of two--using GCC 4.6 with SSE
enabled--for any large N by rejiggering the hash to load in batches and
minimize intermediate dependencies
h0 = h ^ ((h<<5) + (h>>2) + src[0]);
h1 = h ^ ((h<<5) + (h>>2) + src[1]);
h2 = h ^ ((h<<5) + (h>>2) + src[2]);
h3 = h ^ ((h<<5) + (h>>2) + src[3]);
h4 = h ^ ((h<<5) + (h>>2) + src[4]);
h5 = h ^ ((h<<5) + (h>>2) + src[5]);
h6 = h ^ ((h<<5) + (h>>2) + src[6]);
h7 = h ^ ((h<<5) + (h>>2) + src[7]);
h = h0 | h1 | h2 | h3 | h4 | h5 | h6 | h7;
dst[0] = src[0];
dst[1] = src[1];
dst[2] = src[2];
dst[3] = src[3];
dst[4] = src[4];
dst[5] = src[5];
dst[6] = src[6];
dst[7] = src[7];
dst += 8;
src += 8;
compared with the stock Lua code--compiled using same optimization flags and
with GCC inlining its optimized memcpy. That's pretty good when you consider
lua is doing a fixed 32 hash operations compared to N.
Although perhaps this new algorithm totally ruins the key distribution. I
have a law degree, not a math degree.
- References:
- Re: Hash Table Collisions (n.runs-SA-2011.004), TNHarris
- Re: Hash Table Collisions (n.runs-SA-2011.004), Mark Hamburg
- Re: Hash Table Collisions (n.runs-SA-2011.004), Tom N Harris
- Re: Hash Table Collisions (n.runs-SA-2011.004), Mark Hamburg
- Re: Hash Table Collisions (n.runs-SA-2011.004), Vladimir Protasov
- Re: Hash Table Collisions (n.runs-SA-2011.004), Leo Razoumov
- Re: Hash Table Collisions (n.runs-SA-2011.004), Vladimir Protasov
- Re: Hash Table Collisions (n.runs-SA-2011.004), Miles Bader
- Re: Hash Table Collisions (n.runs-SA-2011.004), Ashwin Hirschi
- Re: Hash Table Collisions (n.runs-SA-2011.004), Miles Bader