lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, Apr 23, 2020 at 3:11 PM Philippe Verdy <verdyp@gmail.com> wrote:
>
> In reality, setting a table's key to nil should have also the effect
> in the "node" of setting BOTH pointers to the key and to the value to
> nil,
> But the next pointer unchanged (to not break existing chains) because
> there's no way to know if it is linked from another node without
> performing a full scan of the hash table (this full scan could be
> avoided by having a double chain with another "prev" pointer in each
> node).

There are a lot of ways to avoid a full table scan in this case, other
than throwing out the idea of scatter tables altogether or using a
doubly linked list.

One invariant that might not have been mentioned so far, but which is
nonetheless true, is that each node is pointed to by at most 1 other
node's next pointer. The code used to move a colliding key out of an
inserted key's main slot depends on this fact.

One approach to avoiding a full table scan is to just delete the node
from the chain at the time when you set t[k] = nil. This has a few
cases: If the key it in its main slot, and the next pointer not null,
you should move the node from the next slot into the main slot. If the
node is not in its main slot, some other node prev points to it. You'd
want to set prev.next = prev.next.next, then set the key and value on
the node to nil.

This approach seems nice and easy, but it breaks the built-in next
function, which should take in a table and a key and return the next
key in some order. Repeatedly calling next with the key it previously
returned should let a user traverse the entire table, as long as no
new keys are added to the table. So this whole idea of removing the
key from the data structure entirely or rearranging the keys when
setting an existing key to nil doesn't really work given the
guarantees provided by next. We're only allowed to do this sort of
cleanup work when inserting a new key.

If when inserting a key k2, we were guaranteed to find either a chain
rooted at it's main position mp(k2) or a node in a chain for some
other key k1, with some way of quickly finding mp(k1), then we would
be able to delete mp(k2) from k1's chain quickly (in time proportional
to the length of k1's chain, just like in the case where we have to
move a colliding non-empty node). Unfortunately, k1 is not guaranteed
to be present. It is left in the table when you originally set t[k1] =
nil to allow next(t,k1) to work, but after that it could be removed by
the GC, leaving a node like {k=nil, v=nil, next=not null}. Oh no!

We could provide ourselves a way of finding mp(k1) if when we
originally set t[k1] = nil, we set the integer in the node's value to
the hash of k1 (recalling that a lua value is a tagged union, and the
union part is unused by nils). Then, when we go to insert k2 and we
find {key=anything, value=nil, next=not null}, we could traverse the
chain starting at the slot in the value's integer part to remove this
node from it before inserting k2.

This would seem to get us to a state of affairs where chains can never
overlap, next works as it does now, and no operation becomes
substantially more expensive.

I spent a bit of time worrying that in use cases where many keys are
inserted and removed from a table while the size of the table remains
more or less the same, like the hash table attached to a heap that
supports update-key in A* or Dijkstra or an LRU cache, the performance
may degrade. My mind was put at ease when Roberto shared that all this
won't matter anyway, because for use cases like these the table will
be regularly rehashed even if it does not change size.

Thanks,
Robert
_______________________________________________
lua-l mailing list -- lua-l@lists.lua.org
To unsubscribe send an email to lua-l-leave@lists.lua.org