lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Juris Kalnins wrote:
> I was thinking how much it is possible to gain by searching for hash  
> function
> that gives least number of lower bit collisions on the set of keys that
> are actually used together in the application. (e.g. names of modules,  
> names of metamethods, ...)

Yes, that's what I did. And it had less collisions. Unfortunately,
I haven't tested on a wide enough range of strings.

Selecting only up to 3*4 bytes was fast, but not good enough. See
also the recent post from Florian Weimer: whatever we use, it'll
collide on _some_ set of strings. Full hashing is no exception, so
we might as well stay with partial hashing and optimize for common
usage patterns.

To proceed, one would have to gather different collections from
real-life apps. And then vary the different ways to do a partial
hash and the hash function itself (the Jenkins hash wasn't so bad).

If you come up with anything, please post your results.

> Also, if I understand correctly, LJ 2 can much optimize table lookups if  
> the key lives at is 'primary' position?

You mean the HREFK thing? No, it specializes constant key lookups
to a hash slot number, not the chain position. It doesn't care
where in the chain the key is (it doesn't look at node->next
either). The HREFK optimization is not affected by hash quality.

That said, hash quality _is_ important, because string interning,
all lookups in the interpreter and variable key lookups in
compiled code are still affected.

--Mike