• Subject: thought on Lua's hash-table invariants
• From: Norman Ramsey <nr@...>
• Date: Mon, 20 Apr 2020 17:32:02 -0400

```A couple of months ago I spent some time studying Lua's hash tables,
preparing for a new course I'm teaching in September.  Given the
recent traffic, perhaps these thoughts would be worth posting here.

If anyone on the Lua team cares to comment, I would be delighted!

Norman

----------------------------------------------------------------
Lua hash table, perceived invariants:

The hash part is an array of N "spots," each of which contains

key   - a Lua value; nil signifies an empty spot
value - a Lua value, never nil
next  - a (possibly NULL) pointer to another spot

The contents of a spot is called a "node."

N is a power of 2.

The table satisfies this invariant:

- If the is a set of size K > 0 of keys in the hash part that all
hash to the same spot, then that set is exactly the set of keys
reachable by starting with the key in the spot and following
`next` pointers.

As a corollary, if there are K > 0 keys that hash to the same
spot, then one of those keys occupies the spot.

The spot that a key hashes to is that key's "main spot."

Here's how the invariant is maintained as keys are added to or removed
from the table:

- When a key is removed, if it occupies its main spot, then the node
in the spot pointed to by the `next` pointer, if any, is migrated
into the main spot, and the key in the `next` spot is set to nil.

- When a key is removed, if it doesn't occupy its main spot, then it
is removed from the linked list that originates in the main spot,
and its the key in its node is set to nil.

- When a key is added, if its main spot is unoccupied, it is added
to that spot with a NULL `next` pointer.

- When a key-value pair (k, v) is added, if its main spot is
occupied by key-value pair (k', v'), and if k and k' hash to
different spots, then the main spot is overwritten with (k, v),

Why this recursion terminates: the number of keys occupying their
main spots increases, and it's bounded above by the population of
the table.

- When a key-value pair (k, v) is added, if its main spot is
occupied by key-value pair (k', v'), and if k and k' hash to
the same spot, then we scan successive spots until we find an
empty one, we put (k, v) there, and we link it into the list
pointed to from the main spot.

(We could equally well evict (k', v') and scan successive spots
for it, but without any experiments to indicate what's best, we
opt for less work.)

Incidentally, this organization allows for a move-to-front
optimization for colliding keys.  But we hope chains are so short that
this is not necessary.

```