lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Other similar implementations of Knuth's hash algorithm for 64-bit wordsize use a constant multiplier equal to
* 2,654,435,761 = 0x9E3779B1 = 0b1001_1110_0011_0111_0111_1001_1011_0001
However the number of alternances of sequences of 0 or 1 is only 15 (so it is not optimal, it should be 31: too many sequences with 2 and 3 equal bits, and even 2 sequences of 4 equal bits), even if  2,654,435,761 is a prime.

Several implementation used a "gold number" incorrectly set to 265,445,761 which is wrong (not a prime, multiple of 7), from a typo or misreading of the Knuth's suggested multiplier for 64-bit words.

For 64-bit wordsize we can get better distribution of bits of the input using more alternance of bits, this is easy to do.

If one wants to strengthen the Knuth's algorithm, after summing the input to the current hash value, you can just take several rounds of multiplication by a prime and rotate by an odd prime number of bits (should still be 3, no interest to rotate by more than 3 bits), and you may choose different prime constants for each round.

Basically this is exactly what SIPhash does with its multiples rounds. SIPhash uses 4 rounds, all using the same multiplier constant, but it also requires that intermediate steps use twice larger words; the extra high bits at end of rounds are then XOR'ed after shighting them down, to reduce it to the word size. I am convinced that its constant prime factor is not optimal, that's why it requires multiple rounds (but it is slow) to compensate this effect. And the fastest (on some platforms with large enough L1 data caches) implementation of SIPhash to avoid loops requires using lookup "S-tables". It is difficult to optimize and port correctly.





Le lun. 25 mai 2020 à 19:55, Philippe Verdy <verdyp@gmail.com> a écrit :
What I describe here is not slower than the current implementation. It is also a "add-multiply-rotate-xor" hashing function (like SIPhash) but with a simpler pragmatic design.

I'm not even convinced that SIPhash is really optimal (I've not check what was its equivalent multiplication factor, if it exists, to see if it's **really** a prime) and its implementation is more complex (with a complex series of loops for each step), than the single line I propose.

SIPhash uses a seed which is a static string, but here we can keep the variable "seed" (which is combined with the string length like in SIPhash) and that can be an arbitrary integer (that the implementation of table could initialize at startup with a random number, and kept in a static variable as long as the Lua VM is running or is not reset for for other independant Lua script runs.

My proposal (unlike SIPhash) does not need any lookup table to perform the compression step at each loop (and SIPhash uses multiple loops per input unit, here I just use one loop per unit).

My proposal also allows processing units larger than one byte, to process the string with aligned word reading (replace cast_byte(str[l]) by cast_unit(str[l]): this can be used for the first word of the string and the last word (padded with zeros using byte reading).

* Short strings that are smaller than 2 words (smaller than 16 bytes on a system with 64-bit uint's) can be processed by a simple loop using only byte reading, without taking care of alignment and byte order.
* Long strings that are 2 words or longer can always be processed by hashing the first word, then an aligned loop every "step" words (where the last processed word in the loop will contain up to 8 bytes at end of string, followed by the remaining bytes at end of string that will fill an aligned word...

The step used in the loop can be adjusted to reduce the hashing time: it's easy to termine the step value that ensures that a given maximum number of loop will remain low (for example between 32 and 64) for any string length. The starting N bytes of the string and the ending N bytes of the string (where N is the wordsize) will always be hashed, just like the string length.





Le lun. 25 mai 2020 à 19:17, Philippe Verdy <verdyp@gmail.com> a écrit :
No even with step=1 the transform of h is wrong. As I said (h<<5)+(h>>2) is equivalent to (h<<7 + h) (with preservation of the two highest bits) followed by a rotation of 2 bits to the right. and (h<<7+h) is equivalent to (h*129) so the effect is that the transform multiplies by a non-prime (129 which is divisible by 3), followed by a rotation of 2 bits (which is also wrong because the word size is a multiple of 2, so you have divided the hash collision search space by half the wordsize).

This should immediately be changed to (h<<4)-(h>>3): this is equivalent to (h*127)>>>3, where 127 is prime (but not optimal as it is a simple Mersenne that still does correctly distributes all bits of the input)

As well you use a multiplication before just adding a new byte, so hashes of strings with equal lenght 'a','b','c' are always successive integers.

You should  really first add the new byte to h, then perform the multiplication (by 127, not 129 if you still want to use a Mersenne number where the multiplication is just an addition or difference of the input and the shifted input!) and then the rotation (by 3 bits, not 2!)

This won't make any different in terms of local time to compute a single hash. But the hash will be MUCH stronger (because you used a prime for the multiplication factor and used a rotation by a odd prime number to that all bits are equally distributed across loops).

And as I said, Mersenne primes are not the best primes for the constant factor: it is only a workaround solution for very small devices that don't have a fast multiplication of two integers. you should use a prime that has alternates the highest number of sequences of 1 and sequences of 0 (ideally a binary number of the form "101010...101" and only if it is a prime.

It appears that the hex number 0x5555 = 0b0101_0101_0101_0101 (for a 16-bit multiplier, suitable for a 32-bit wordsize for the hash) = 21845 is not a prime as it is divisible by a small factor 5.

So you need to allow two successive bits being set to 1:
* 0b1101_0101_0101_0101 = 0xD555 = 54613 : still not a prime
* 0b1011_0101_0101_0101 = 0xB555 = 46421 : still not a prime
* 0b1010_1101_0101_0101 = 0xAD55 = 44373 : still not a prime
* 0b1010_1011_0101_0101 = 0xAB55 = 44861 : still not a prime
* 0b1010_1010_1101_0101 = 0xAAD5 = 43733 : still not a prime
* 0b1010_1010_1011_0101 = 0xAAB5 = 43701 : still not a prime
* 0b1010_1010_1010_1101 = 0xAAAD = 43693 : still not a prime
* 0b1010_1010_1010_1011 = 0xAAAB = 43691 : IT IS A PRIME ! (see https://primes.utm.edu/lists/small/10000.txt)

But of course NOT a Mersenne (so cannot be computed with a single addition and two shifts like in the existing code which uses a non-prime Mersenne of the form 2^p+1; with the factor 127, where you've just replaced the "+" by a "-", you use a prime Mersenne of the form 2^p-1, but it's not optimal in terms of propagation of bits of the input in the hash value).

So the optimum solution (for 32-bit wordsize) for fast hashing and the lowest level of collisions is finally this line of code replacing the existing one:

   h ^= ((h + cast_byte(ptr[l])) * 0xAAAB)>>>3

Note that even if the multiplication of (h+b) by 0xAAAB cannot be reduced by two shifts and an addition it can be realized that this is like multiplying (h+b) by 0xAA, keeping that result, then shifting by 8 bit and adding the kept result and finally adding the initial h. And to multiply (h+b) by 0xAA, you can split it in two parts as well by shifting by 4 bits so that you need just multiply by 0xA (which is simply the sum of the input shifted by 1 or 3 bits) So in fine, it can be implemented as a few additions and shifts on platforms that don't have fast multiplication (but I bet that even in this case, a single multiplication operation will be faster than using these multiple shifts and additions; the C compiler will generate the code itself to support these multiplications of integers)

You can make similar search for 16-bit or 64-bit wordsize, it won't take long to find a suitable prime that maximimizes the number of alternating sequences of bits set to 1 and 0. The same line of code above will be used, only the constant 0xAAAB will be different.


Le lun. 25 mai 2020 à 15:10, Andrea <andrea.l.vitali@gmail.com> a écrit :
But DoS attacks are a concern and we can prevent it significantly with a correct hash (the current implementation is not)

It is actually quite easy to fix this in your Lua installation 

1) keep the hash as it is, set the step to 1 for long strings (and accept the corresponding slowdown)

2) change the hash to some keyed version, the key is the seed, so that hash is unpredictable; example Siphash is unpredictable  even if one could access the strings and the hashes (but is not easily portable, and very slow)

    Andrea 
--
Andrea Vitali