lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Tue, Jan 3, 2012 at 23:43, William Ahern <william@25thandclement.com> wrote:
> On Wed, Jan 04, 2012 at 12:17:58PM +0900, Miles Bader wrote:
>> > I believe Lua needs to do the same. If it does not, it risks being
>> > deemed not suitable for writing software for the web. I don't know
>> > about you, but I for one would not like to see that happen.
>>
>> An "all characters" hash is not free, especially for _very_ long
>> strings (I regularly handle 500MB strings), and the benefit of such a
>> change needs to be weighed against the costs, weighted by the
>> likelihood of each case.  [This is especially true on less performant
>> hardware, and I think Lua is much more frequently used on such
>> platforms compared to more bloated/weighty languages like Python,
>> Ruby, etc.]
>
> For strings that large memory bandwidth constraints provide an opportunity
> to close the gap. I got within a factor of two--using GCC 4.6 with SSE
> enabled--for any large N by rejiggering the hash to load in batches and
> minimize intermediate dependencies
>
>        h0 = h ^ ((h<<5) + (h>>2) + src[0]);
>        h1 = h ^ ((h<<5) + (h>>2) + src[1]);
>        h2 = h ^ ((h<<5) + (h>>2) + src[2]);
>        h3 = h ^ ((h<<5) + (h>>2) + src[3]);
>        h4 = h ^ ((h<<5) + (h>>2) + src[4]);
>        h5 = h ^ ((h<<5) + (h>>2) + src[5]);
>        h6 = h ^ ((h<<5) + (h>>2) + src[6]);
>        h7 = h ^ ((h<<5) + (h>>2) + src[7]);
>        h = h0 | h1 | h2 | h3 | h4 | h5 | h6 | h7;
>
>        dst[0] = src[0];
>        dst[1] = src[1];
>        dst[2] = src[2];
>        dst[3] = src[3];
>        dst[4] = src[4];
>        dst[5] = src[5];
>        dst[6] = src[6];
>        dst[7] = src[7];
>
>        dst += 8;
>        src += 8;
>
> compared with the stock Lua code--compiled using same optimization flags and
> with GCC inlining its optimized memcpy. That's pretty good when you consider
> lua is doing a fixed 32 hash operations compared to N.
>
> Although perhaps this new algorithm totally ruins the key distribution. I
> have a law degree, not a math degree.
>
>

A thought: what if Lua allowed for the user to provide a different
hash function at build-time or when creating the Lua state? The
original function is good enough for the majority of use cases, but in
cases like this where excessive collisions could be an issue, you
could provide an alternate, maybe even using hardware-accelerated
hashing...

-- 
Sent from my toaster.