lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I could do that, but two functions   luaS_hash() and luaS_hashlongstr() make no sense: they should be just one.
And if something must be done, it's an alternative no longer using any hashtable with collision lists, but only B-trees.

This still also requires another CORRECT hashing function (the current version is wrong according to  Knuth's "fast hash" algorithm that it was supposed to implement, but in fact it uses a non-prime (129) that has a very small divisor (3) and this is even worse because you rotate at each round by steps of 2 bits (so not all bits are distributed: the rotation should also be prime and should be 3 bits).
Knuth explicitly stated in his published papers that his algorithm was valid ONLY if the factor and the rotation were primes (and preferably a large factor, the largest that can work with multiplications, and certainly not any Mersenne even if it's prime, the rotation must be prime according to the word size but rotating by 2 bits is incorrect because wordsizes are always multiple of 2), otherwise it no longer has any of the needed properties for an fast hashed access.

Immediately I don't see why you can't compare by replacing a "+" by a "-" and replacing the existing rotation of 2 bits to the left by a rotation of 3 bits (to implement the rotation, it currently shift the lowest bits to the left by N before adding the highest bits shifted to the right by 2 bits; you jsut need to replace that N by N-1 and replace the 2 by 3. you can instantly see that this does not change the performance of the function, the generated code is nearly the same, but you instantly removed many collisions and any benchmark will show that the average access time in tables (not filled as a sequence) is faster.

This change has no effect on tables filled as pure sequences of keys in order from 1 to N and with non-nil values.

Another really different implementation will be longer to develop and will require more tests. But this small change can fit in the short schedule for Lua 5.4. It's easy to do, does not change anything else in the code than just ONE LINE.




Le mar. 26 mai 2020 à 18:31, Andrea <andrea.l.vitali@gmail.com> a écrit :

For large tables (with many keys or where keys are added/removed often and randomly), Lua is extremely poor compared to ...
See the current situation in Wikimedia Commons ...
 
Can you share benchmarks and numbers to prove the point?

If you want to change the hash function in your Lua implementation, this is very easily done: just edit the lstring.c and rebuild your executable. 
Edit the luaS_hash() and luaS_hashlongstr() to change the way hash is computed and how long strings are sampled.
Then perform some benchmark and let's see if the predicted improvement is there.

   Andrea


--
Andrea Vitali