• Subject: Re: thought on Lua's hash-table invariants
• From: Philippe Verdy <verdyp@...>
• Date: Thu, 23 Apr 2020 08:10:55 +0200

```In reality, setting a table's key to nil should have also the effect
in the "node" of setting BOTH pointers to the key and to the value to
nil,
But the next pointer unchanged (to not break existing chains) because
there's no way to know if it is linked from another node without
performing a full scan of the hash table (this full scan could be
avoided by having a double chain with another "prev" pointer in each
node).

Now if you have a double-linked chain, you can safely and completely
clear any node when you set a table's value to nil. And you can then

When you set a table's entry value to a non-nil value, if its "main"
node (at the position derived from the hash of the key, modulo the
hash table size) is occupied (the pointers for the key and for the
value are necessarily BOTH non-nil) and you cannot find the key on the
forward chain of nodes starting at that main node, you can use the
free list to allocate a new one: remove the free node from the
passing by the main node position (you don't need to perform any full
scan of the hash table for that, but you'll have already performed a
partial scan of the forward list only).

Using a double-linked chain can help reduce the length of collision
chains, at the modest price of an additional pointer in each node,
i.e. 4 pointers {key,value,prev,next} instead of 3 {key,value,next}.
Nodes are not necessarily in a single object, as the hashtable can use
separate vectors of the same size for keys, values, prev and next
pointers, and the prev/next pointers can also be reduced in size by
changing them to small integers (they are in fact only relative index
of the node into the same 4 vectors), while the keys/values can both
be pointers to objects of any type with their own allocation and
lifetime. It is possible to group them into two vectors, a "kv" vector
of size 2N storing (key,value) pairs as two successive object
pointers, and a "pn" vector also of size 2N storing (prev,next) pairs
as two successive small integers (the size of integers would not need
to be larger than 16-bit which means a hash table with up to 64K
nodes).

If you need to store more nodes for Lua tables containing more than
64K distinct keys associated with non-nil values, it's preferable to
split the node index into two parts with 16-bit at most for each part
so that you can allocate new "kv" "pn" vectors indexed by the
low-order bits of the node number and store their reference in a
parent vector indexed by the high-order bits of the node number. But
the "pn" vectors in that case would larger integers (could be 32-bit
integers for allowing up to 4 billions distinct keys, or 64-bit
otherwise only on 64-bit platforms); the "kv" vectors however should
use the standard pointer size for any object (and will often be
64-bit, except on 32-bit platforms)... Such limitation of size for
"kv" and "pn" vectors can dramatically reduce the memory maintenance
by limiting the level of fragmentation of the global memory. It adds
one level of indirection but can avoid wasting more memory than
needed; the "kv" and "pn" vectors are all the same size (a small power
of 2, so they can be allocated both in the same block of memory), but
the parent vector pointing to them no longer needs to be a power of 2
(so the total number of nodes in the hash table will not be
necessarily a power of 2, but will be any multiple of the small power
of 2 chosen for the size of "kn" or "pn" vectors).

And the garbage collector can still help recovering the unused objects
containing an empty "kn" vector and an "pn" vector and can easily move
them to keep the data locality in memory and help recovering the
committed size of the global Lua VM on the host system (because that
Lua VM would occupy also space in the paging file of the system and
could fragment it for long time even if it is no longer in use).

Le jeu. 23 avr. 2020 à 06:29, Robert Burke <sharpobject@gmail.com> a écrit :
>
> On Wed, Apr 22, 2020 at 10:23 PM Roberto Ierusalimschy
> <roberto@inf.puc-rio.br> wrote:
> >
> > > <roberto@inf.puc-rio.br> wrote:
> > > > >   - When a key-value pair (k, v) is added, if its main spot is
> > > > >     occupied by key-value pair (k', v'), and if k and k' hash to
> > > > >     different spots, then the main spot is overwritten with (k, v),
> > > > >     and (k', v') is added.
> > > >
> > > > Right.
> > >
> > > Is the chain that the node was previously a part of modified by the
> > > removal of (k', v')?
> >
> > No. As I said in that other message, a removal does not change anything
> > in a table. It only sets the node's value to nil, so that it can be
> > reused by another key that hashes to that node (or by the same key,
> > if it is reinserted). Only rehashes actually clear previous removals.
> > (But removals also do not trigger rehashes.)
> >
> Sorry, I think I was unclear in my original email. I meant to ask "Is
> the chain that the node was previously a part of modified as a part of
> inserting (k, v) and moving (k', v')?" I had a look at the code and
> the answer to my revised question is "Yes, the chain starting at the
> main slot of k' is followed from the beginning until the node before
> the main slot of k, then that node's next pointer is set to point to a
> free node, and the node containing (k', v') including its next pointer
> is copied into the free node to make room for (k, v)"
> _______________________________________________
> lua-l mailing list -- lua-l@lists.lua.org
> To unsubscribe send an email to lua-l-leave@lists.lua.org
_______________________________________________
lua-l mailing list -- lua-l@lists.lua.org
To unsubscribe send an email to lua-l-leave@lists.lua.org
```