• Subject: Re: Hash Table Collisions (n.runs-SA-2011.004)
• From: William Ahern <william@...>
• Date: Wed, 4 Jan 2012 12:34:42 -0800

```On Wed, Jan 04, 2012 at 11:22:05AM -0800, G?? Weijers wrote:
> On Wed, Jan 4, 2012 at 10:04 AM, William Ahern
> <william@25thandclement.com>wrote:
>
> > On Tue, Jan 03, 2012 at 10:43:07PM -0800, William Ahern wrote:
> > <snip>
> > >       h0 = h ^ ((h<<5) + (h>>2) + src[0]);
> > >       h1 = h ^ ((h<<5) + (h>>2) + src[1]);
> > >       h2 = h ^ ((h<<5) + (h>>2) + src[2]);
> > >       h3 = h ^ ((h<<5) + (h>>2) + src[3]);
> > >       h4 = h ^ ((h<<5) + (h>>2) + src[4]);
> > >       h5 = h ^ ((h<<5) + (h>>2) + src[5]);
> > >       h6 = h ^ ((h<<5) + (h>>2) + src[6]);
> > >       h7 = h ^ ((h<<5) + (h>>2) + src[7]);
> > >       h = h0 | h1 | h2 | h3 | h4 | h5 | h6 | h7;
> >
>
> Note that h0, h1, h2 etc all get the same value of the expression '(h << 5)
> + (h >> 2)', which may not be what you want......

It's still better than just skipping over most of them. But principally it
allows each shift expression to be computed once over the eight statements.
I didn't really put much thought into it except to try to make it easy for
the compiler to optimize.

What should be removed though is the serial xor with h. There's an even
number so it's a complete waste. h should just be xor'd in at the end with
the others.

There's certainly a lot more wrong with it. I was just trying to show that
you can do hash ops over _every_ byte and still have almost equivalent
performance with the fixed 32 char sampling scheme because, fundamentally,
string interning is O(N).

Hashing every char doesn't solve the collision attack issues, of course. It
probably doesn't even make it much more difficult from a cryptographer's
perspective given the simplicity of the mixing.

```