[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: thought on Lua's hash-table invariants
- From: Norman Ramsey <nr@...>
- Date: Mon, 20 Apr 2020 17:32:02 -0400
A couple of months ago I spent some time studying Lua's hash tables,
preparing for a new course I'm teaching in September.  Given the
recent traffic, perhaps these thoughts would be worth posting here.
If anyone on the Lua team cares to comment, I would be delighted!
Norman
----------------------------------------------------------------
Lua hash table, perceived invariants:
The hash part is an array of N "spots," each of which contains
  key   - a Lua value; nil signifies an empty spot
  value - a Lua value, never nil
  next  - a (possibly NULL) pointer to another spot
The contents of a spot is called a "node."
N is a power of 2.
The table satisfies this invariant: 
  - If the is a set of size K > 0 of keys in the hash part that all
    hash to the same spot, then that set is exactly the set of keys
    reachable by starting with the key in the spot and following
    `next` pointers.
    As a corollary, if there are K > 0 keys that hash to the same
    spot, then one of those keys occupies the spot.
The spot that a key hashes to is that key's "main spot."
Here's how the invariant is maintained as keys are added to or removed
from the table:
  - When a key is removed, if it occupies its main spot, then the node
    in the spot pointed to by the `next` pointer, if any, is migrated
    into the main spot, and the key in the `next` spot is set to nil.
  - When a key is removed, if it doesn't occupy its main spot, then it
    is removed from the linked list that originates in the main spot,
    and its the key in its node is set to nil.
  - When a key is added, if its main spot is unoccupied, it is added
    to that spot with a NULL `next` pointer.
  - When a key-value pair (k, v) is added, if its main spot is
    occupied by key-value pair (k', v'), and if k and k' hash to
    different spots, then the main spot is overwritten with (k, v),
    and (k', v') is added.
    Why this recursion terminates: the number of keys occupying their
    main spots increases, and it's bounded above by the population of
    the table.
  - When a key-value pair (k, v) is added, if its main spot is
    occupied by key-value pair (k', v'), and if k and k' hash to
    the same spot, then we scan successive spots until we find an
    empty one, we put (k, v) there, and we link it into the list
    pointed to from the main spot.
    (We could equally well evict (k', v') and scan successive spots
    for it, but without any experiments to indicate what's best, we
    opt for less work.)
Incidentally, this organization allows for a move-to-front
optimization for colliding keys.  But we hope chains are so short that
this is not necessary.