• Subject: Re: thought on Lua's hash-table invariants
• From: Norman Ramsey <nr@...>
• Date: Sat, 25 Apr 2020 09:25:12 -0400

``` > > If anyone on the Lua team cares to comment, I would be delighted!
>
> My pleasure! :-)
>
> Before we go on, note that, although the main algorithm has remained the
> same during the years, several details of the kind you are discussing
> have changed across versions. I will use the current version (5.4 :-)
> as the reference here.  If you used another version for your analysis
> (2.5?), let me know.

I was studying version 5.3.4.  Tolerably up to date!

Thanks for your corrections!  I need to go study some more.
Here is the key thing I got wrong:

> >   - When a key is removed, if it occupies its main spot, then the node
> >     in the spot pointed to by the `next` pointer, if any, is migrated
> >     into the main spot, and the key in the `next` spot is set to nil.

And as Philippe Verdy kindly points out, the issue with this plan is
that removal from the linked list can't be done in constant time.

(I am wondering about making the linked list circular, so an element
could be removed in time proportional to the length of a collision
chain, but I'll have to postpone idea that until I can set aside some
time to study it.)

> > The table satisfies this invariant:
> >
> >   - If the is a set of size K > 0 of keys in the hash part that all
> >     hash to the same spot, then that set is exactly the set of keys
> >     reachable by starting with the key in the spot and following
> >     `next` pointers.
>
> As discussed here recently, two lists may merge.

Got it!  But if I understand correctly, this happens only if a removal
is followed by an insertion.   It would be interesting to think about
the probability of such an event.  A new node has to hash to the
position previously occupied by a removed node.  If there is a long
sequence of remove/insert operations with different keys, I wonder if
short chains could eventually merge to become long ones (as the
overall size doesn't change)?

> We might choose not to reuse N2, considering free to reuse only nodes
> with key==nil. In general, we do not consider uses that removes entries
> from tables "hot uses" (using a table as an array or a record
> creates only "hot uses"), and so we think any performance differences
> in those uses will be marginal for most programs. Moreover, it is
> difficult to know which strategy would be "better". Reusing N2
> may create longer lists, but it postpones rehashes.

Understood.  The reuse strategy has a grim worst case, but that worst
case may be nearly impossible to reach.  And as you said, the worst
case is not what you are engineering for.

Thanks much!  Things are not quite as simple as I had hoped, but they
are almost so.  I am looking forward to putting all this information
together for my students (this summer).

Norman
_______________________________________________
lua-l mailing list -- lua-l@lists.lua.org
To unsubscribe send an email to lua-l-leave@lists.lua.org

```