lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


A table size for 16 elements: 2^4-1=15 is Mersenne, but is a multiple of 3 and 5, so collisions occur for objects whose differences of addresses is frequently a multiple of 3 or 5 (e.g. allocating many objects with size 10, or 12 or 20).
A table size for 32 elements: 2^5-1=31 is Mersenne, but is prime (no problem here).
A table size for 64 elements: 2^6-1=63 is Mersenne, but is a multiple of 3, so collisions occur for objects whose differences of addresses is frequently a multiple of 3 (e.g. allocating many objects with size 12)
A table size for 128 elements: 2^7-1=127 is Mersenne, but is prime (no problem here).
A table size for 512 elements: 2^9-1=511 is Mersenne, but is a multiple of 3.
A table size with 256 elements: 2^8-1=255 is Mersennen but is a multiple of 5

The current limitation of table sizes to a power of 2 offers NO protection at all against collision of pointers to large sets of objects allocated with the same size, Mersenne numbers (2^n-1) frequently have small divisors (3, 5, 7 beging the most common).


Le mar. 26 mai 2020 à 03:17, Philippe Verdy <verdyp@gmail.com> a écrit :
Note that not all Mersenne numbers 2^p-1 are primes (even if they are not powers of 2, nor multiples of power of 2, like aligned addresses)

But it's true that the lower bits of aligned adresses have no use in the final hash and could be stripped before using the modulo. But then the stripped address may be a multiple of the Mersenne if the Mersenne is not prime.

For example 2^16-1 is a multiple of 5, so addresses that are 5 times the alignment size for addresses will collide frequently in a table with 2^16 nodes.

The only way to avoid this is to multiply the stripped-down address (converted to integer without the low bits always equal to 0) by a large enough **non-Mersenne** prime like 0xAAAB to "randomize its bits", and then use that result as the hash value of the address.

You DON'T need the (modulo tablesize-1), just the (modulo tablesize) for converting the hash value to a main slot number in the table, like for all other hashes, i.e. just need to mask the high bits if the table size is still limited to a power of 2: that is something that should change for large tablesizes as this is a severe limitation that wastes memory but which is still applicable, given your comment saying that tablesize-1 is a Mersenne (this should not even be true if this limitation of tablesizes=2^p is removed above some value of p, where tablesizes could just be a multiple of 2^p).

This multiplication by a large non-Mersenne prime will also work even if addresses of objects (tables, functions, or userdata whose mutable content is not hashed at all) are not aligned at all (on 8-bit systems that have no alignment constraint at all, but follow some incremental patterns with many multiples of the same base and a constant offset (this occurs when allocating many objects of the same size): the differences of addresses, ie. the base could easily be a small multiple or submultiple of the (tablesize-1) which is most often NOT a prime and may have small divisors.





Le lun. 18 mai 2020 à 18:53, Roberto Ierusalimschy <roberto@inf.puc-rio.br> a écrit :
> the hash for objects such as tables and functions seems to be the memory
> address at which they are stored, this is indeed unique but due to
> alignment when it is reduced modulo 2^n (the size of the table) it may
> cause many collisions
>
> does Lua do something to randomize the memory address or the hash derived
> from the memory address?

For these objects, the final hash is done modulo table size - 1, which
is a Mersene number.

-- Roberto
_______________________________________________
lua-l mailing list -- lua-l@lists.lua.org
To unsubscribe send an email to lua-l-leave@lists.lua.org