• Subject: Re: thought on Lua's hash-table invariants
• From: Roberto Ierusalimschy <roberto@...>
• Date: Tue, 21 Apr 2020 15:30:19 -0300

```> If anyone on the Lua team cares to comment, I would be delighted!

My pleasure! :-)

Before we go on, note that, although the main algorithm has remained the
same during the years, several details of the kind you are discussing
have changed across versions. I will use the current version (5.4 :-)
as the reference here.  If you used another version for your analysis
(2.5?), let me know.

> ----------------------------------------------------------------
> Lua hash table, perceived invariants:
>
> The hash part is an array of N "spots," each of which contains
>
>   key   - a Lua value; nil signifies an empty spot
>   value - a Lua value, never nil
>   next  - a (possibly NULL) pointer to another spot

"Empty spot" may have different meanings, depending on what is being
done. For practically all tasks, a value==nil means the node is free.
The only restriction on its "freedom" is that it may be already part
of a list, so we cannot change its 'next' field. So, when looking
for a free node to link into a list, we must look for a node with
key==nil (a "virgin" node, which was never part of any list). For
everything else, value==nil is what is checked.

> The table satisfies this invariant:
>
>   - If the is a set of size K > 0 of keys in the hash part that all
>     hash to the same spot, then that set is exactly the set of keys
>     reachable by starting with the key in the spot and following
>     `next` pointers.

As discussed here recently, two lists may merge. (But I think I was
wrong when I said this must be the case.) Assume node N2 is in the list
of node N1 (so, it is not in its main position). Now, N2 is set to nil
(and therefore it is now free, although still part of the N1 list).
Now, a new key being inserted hashes to N2. As N2 is free (because its
value==nil!), this new key will go into N2, and then its list will merge
with the list of N1.

We might choose not to reuse N2, considering free to reuse only nodes
with key==nil. In general, we do not consider uses that removes entries
from tables "hot uses" (using a table as an array or a record
creates only "hot uses"), and so we think any performance differences
in those uses will be marginal for most programs. Moreover, it is
difficult to know which strategy would be "better". Reusing N2
may create longer lists, but it postpones rehashes.

>     As a corollary, if there are K > 0 keys that hash to the same
>     spot, then one of those keys occupies the spot.

Right.

> Here's how the invariant is maintained as keys are added to or removed
> from the table:
>
>   - When a key is removed, if it occupies its main spot, then the node
>     in the spot pointed to by the `next` pointer, if any, is migrated
>     into the main spot, and the key in the `next` spot is set to nil.

A key being removed means only that is value is set to nil, and nothing
at all happens with the structure of the table. Setting a value to nil
is the same as setting it to any other value. Only during rehashing
that nodes can become "virgins" again and 'next' is set to nil.

>   - When a key is removed, if it doesn't occupy its main spot, then it
>     is removed from the linked list that originates in the main spot,
>     and its the key in its node is set to nil.

Again, that is not how it works (at least in the current version). When
a key is removed, it is kept in whatever linked list it was before.
The key itself can be cleared during GC (to avoid it refering a dead
object), but it is not set to nil. (See macro 'setdeadkey'.)

>   - When a key is added, if its main spot is unoccupied, it is added
>     to that spot with a NULL `next` pointer.

Again, the 'next' pointer is not changed. The main spot can be free
(value==nil) and still be part of a chain.

>   - When a key-value pair (k, v) is added, if its main spot is
>     occupied by key-value pair (k', v'), and if k and k' hash to
>     different spots, then the main spot is overwritten with (k, v),
>     and (k', v') is added.

Right.

-- Roberto
_______________________________________________
lua-l mailing list -- lua-l@lists.lua.org
To unsubscribe send an email to lua-l-leave@lists.lua.org

```