Re: question on hash for objects such as tables and functions

Le mer. 27 mai 2020 à 11:09, Francisco Olarte <folarte@peoplecall.com> a écrit :

On Tue, May 26, 2020 at 11:30 PM Philippe Verdy <verdyp@gmail.com> wrote:
> So now you can confirm that the factor 129 is not suitable as is is not prime (divisible by 3) even if it's a Mersenne number (=2^7+1).

To be strict, Mersenne numbers are -1, that +1 is an extension you
have put without proof.

Wrong! I'm not inventing them !

There are two series of Mersenne numbers:

* Mn(p) = 2^p-1, often just noted M(p), and

* Mp(p) = 2^p+1

And the current implementation uses Mp(7) = 129, which is a Mersenne of the second category (this does not matter at all for hash collisions, and if you think it is not Mersenne for their strict definition, it also does not matter at all) but it is not prime (this really matters for Knuth's algorithm).

The only interest of Mersenne numbers (in both series) is that they are simple to compute in arithmetic operations with very large integers (e.g. a single substraction for Mn(p) or a single addition for Mp(p), both being equally complex in terms of computability). Both series are used in primality tests, even if the Mn(p)=2^p-1 series is the most commonly studied one (e.g. in mersenne.org for searches of big Mersenne primes).

Anyway, what can I confirm 129 is not suitable for?

It is NOT suitable for Knuth's algorithm because it is not prime (divisible by a **ridiculously small** factor 3)

> But the current implementation still uses it as the factor: (h<<5)+(h>>2) is the same as (h<<7 + h)>>>2 = (h*128+h)>>>2 = (h*129)>>>2; this way of writing it with the sum of shifts in two directions avoids discarding 2 high bits with (h<<7)

So, you affirm it is the same only to prove it is not the same in the
same sentence?

What??? I'm saying that they are arithmetically equivalent (you can check it yourself) if you ignore the limited bitlength. The difference is when integers have limited bitlength, because shifted bits will be discarded.

...
> Knuth designated good multipliers for several wordsizes. You deliberately ignored it just like you deliberately chose to ignore the primality requiement of the factor: this is a proof that you've not understood the Knuth algorithm and how and why it really works as a good (but still fast) hash.

.... This "you deliberately" seems to indicate you are not replying to
me, although you are doing it in ( top posted as usual, something
smells rotten in Denmark ) response which quotes me.

You directly support this code even if it's evidently incorrect. So you're directly part of the problem.

> So in this piece of code,

Yep, you are totally loosing direction. No piece of code, no context.

Wrong: I gave the code: one line only where

- only 1 character "+" needs to be changed to "-" (for the first goal required by Knuth),

- or 3 characters (for the second goal of shuffling "5" and "2" have to be changed to "4" and "3", so that shuffling is balanced faster without depending on prefix length, or so that different positions of common substrings will impact the hash, and this is false here).

Additionally I showed that the order of operations for rounds is wrong!

- you currently multiply the current hash by the prime constant, then add the new unit, then shuffles by a rotate (by an incorrect number of bits 2, as it is a divisor of the bitlength of integers) before compressing by xoring into the current hash

- it should first add the unit to the current hash, then multiply by the prime contant, then shuffle by rotate (preferably rotating by a small prime number of bits 3, not 2), before compressing by xoring into the current hash.