lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


free nodes pointed only by "last free" is not the best solution: when freeing nodes, you can as well chain them.
But as I said, storing pointers for the chain is absolutely not necessary: you can just chain by advancing to the next candidate node by a static prime number (something that works very well for hash tables that are not *both* very large in their maximum size, and not too much full

If your maximum fill level is 85%, you are warrantied to still have at least 15% free nodes in the table (i.e. at least one free node every 6.67 nodes), and you'll find the free node by this advance (especially if keys were hashed with a strong-enough hashing function, that will then distributes their "main" position randomly throughout the table (in that case your scanning with a static prime constant will fond a free node on average in at most 6.67 lookups; this could be worse only if the hashing function is not randomly distributing the bits of in the hash value they return; but even if the hashing function for keys that are simple integer numbers is just returning their value modulo some constant which should be higher than the hash table size, making steps by the large prime, modulo the table size, will still avoid most of the time long chains for table lookups, except if keys were specially chosen so that they match the order modulo this large prime and modulo the table size.

Using a strong enough hashing function (not necessarily a cryptographically strong message digest) will avoid this. If the the hashing function if using the simple algorim "multiply current hash value by a prime then add the next byte" (Dijkstra's "fast hashing" algorithm), this hashing should use a prime constant factor that is distinct from the large prime used for chaining nodes for lookups in the hash table. Typically such hashing function uses a '"not-so-large" prime (because it allows computing the multiplications without overflows or without having to handle large integers), so typical hshing functions using this "multiply-and-add" algorithm will use a constant factor which should be a positive 15-bit to 31-bit integer, but the constant prime used for advancing in collision chains can be larger (advancing in the collision chain does not require any multiplication, only an addition modulo the table size (for this last modulo, the table size is not necessarily a power of two if we want to handle large hash tables without excessive allocated sizes with too many free nodes, i.e for any table sizes larger than about 16 nodes that should be at least 15% to 25% full).

If your Lua application however is for a critical mission and you want to absolutely avoid the worst cases causing severe degradation of the response time or degradation of performance of the server caused by the hashing of arbitrary user-provided data, or if the hashed keys contains private data that should be protected absolutely, you have no other choice than choosing a cryptographically strong hashing function that is very resistant to collisions in their returned hash values (at least SHA1 which returns enough good bits usable for the hash value; if your cryptographically strong message digest returns more bits than needed, just XOR the extra bits from the digest to form the smaller integer for the hash value, which should preserve or enhance the flat distribution of hash values better than if you just truncate the extra bits from the strong message digest). And in that case, the hashtable lookup can still use collision chains using simple additions for advancing to the next candidate position.

If your hash table is not too full (less than 85%) you are then warrantied in that case (when you use a strong hashing of keys) to locate a free node by performing a lookup using the same additive steps by advancing by the large prime constant (modulo the hash table size) in very few steps (at most 7 lookup steps in the chain for the maximum fill level 85%, and rarely 8 or 9 steps caused by some minor defects in the non-purely flattened distribution of hash values).

The basic function used by Lua for its hash table hwoever is not very strong (it is made to be fast): one way to still use Lua tables for critical data is to first create your own strong digest for any private key that you want to secure against malicious collisions, and then append this digest to the unique key to generate the actual key to be used in table, instead of indexing the table directly by this private key (such appending will still preserve the uniqueness). I wonder however if Lua allows an application to specify its own secure hashing function for storing/reading (key=value} pairs in its tables: is there a metamethod we can use in tables? (setting this methamethod to a different hashing function should instantly cause the reashing of all keys already present in that table).



Le lun. 27 avr. 2020 à 15:36, Roberto Ierusalimschy <roberto@inf.puc-rio.br> a écrit :
>  > I spent a bit of time worrying that in use cases where many keys are
>  > inserted and removed from a table while the size of the table remains
>  > more or less the same, like the hash table attached to a heap that
>  > supports update-key in A* or Dijkstra or an LRU cache, the performance
>  > may degrade. My mind was put at ease when Roberto shared that all this
>  > won't matter anyway, because for use cases like these the table will
>  > be regularly rehashed even if it does not change size.
>
> What triggers the rehash?  Is it the movement of the `lastfree` pointer?

Yes. 'lastfree' only moves in one direction (from the end of the table
to its beginning). Although it may have to skip several slots to find
a free one, the amortized cost is O(1), as it will pass N slots to
insert N keys. When 'lastfree' hits the beginning of the table, and
therefore it cannot find a free slot, there is a rehash.

When doing a rehash, Lua counts the number of slots actually in use,
that is, with non-nil values, and computes the new size for the table.
(That computation includes the size of the array part, too.) This
new size can be larger (the typical case), smaller, or equal to the
previous size. After the rehash, 'lastfree' is reset to the end of
the table abain.

-- Roberto
_______________________________________________
lua-l mailing list -- lua-l@lists.lua.org
To unsubscribe send an email to lua-l-leave@lists.lua.org
_______________________________________________
lua-l mailing list -- lua-l@lists.lua.org
To unsubscribe send an email to lua-l-leave@lists.lua.org