lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Rehashiong a table is only necessary when:
(1) you want to change the maximum number of hash buskets,
(2) you want to change the hashing function to produce another distribution.

In both cases it is justified:
(3) if the collision lists in are overlong, which means that the hasing function was not flattening enough the distribution (was not strong enough) which can only occur when there are too many elements in the table compared to the number of hash buckets.
(4) there are too many empty hash buckets, which occurs when there are too few elements in the table compard to the number of hash buckets.

These last two justifications should be handled by the table implementation which should compute the correct number  of hash buckets, and then rehash only when needed (but not too often: this should be controled by some tuning metrics like minimum and maximum fill factors (a good minimum should be about 10%, but the maximum should be around 400%, where the fill factor is the ratio of total hashbuckets divided by the effective cardinality of keys in the table). But it does not apply to keys that are part of the integer sequence (which follow other rules and should never be hashed at all).

Beside that, when there are some non unitary collision list, rehashing may be justified for security reasons (against time-based side-channel attacks) only if the hashing function is very poor, i.e. when condition (3) is met, but sometimes more often if one wants a protection to defeat predictions, by regularly changing the hash function (for example you can change the prime multipliers used in Knuth modular hash function, by selecting them from a large enough known set of big primes)

However Lua's basic tables do not provide any way to change the hashing function, which is implemented internally by the implementation of the Lua engine itself: I think all these problems should be handled directly by Lua without having Lua programs to do that. Other languages (e.g. Java) allow controling the hasing function and you only need to rehash if you change the hash function (in all other cases, notably metrics of collision lists and fill factors should be always be part of the internal implementation, because an hashing function created in Lua would be quite slow, an in fact quite difficult to write correctly with Lua's dynamic polymorphic data types and also because you cannot hash non-simple types for keys, like tables, functions, or userdata that are only known by their internal reference pointer, not accessible to lua programs; but it is possible if your keys are strings or numbers outside integer sequences from 1 to N).