• Subject: Re: Hash Table Collisions (n.runs-SA-2011.004)
• From: Leo Razoumov <slonik.az@...>
• Date: Fri, 30 Dec 2011 12:42:03 -0500

```On Fri, Dec 30, 2011 at 11:55, Michael Richter <ttmrichter@gmail.com> wrote:
> On 30 December 2011 23:04, Leo Razoumov <slonik.az@gmail.com> wrote:
>>
>> On Fri, Dec 30, 2011 at 09:45, Peter Cawley <lua@corsix.org> wrote:
>> > On Fri, Dec 30, 2011 at 2:31 PM, Leo Razoumov <slonik.az@gmail.com>
>> > wrote:
>> >> On Thu, Dec 29, 2011 at 13:58, Peter Cawley <lua@corsix.org> wrote:
>> >>>
>> >>> The fundamental messages here are:
>> >>> 1) Inserting N elements into a hash table insertion is O(N^2) worst
>> >>> case (regardless of what particular hash table implementation is being
>> >>> used).
>> >>
>> >> Actually, it is not true. You can use self-balancing trees like
>> >> RB-tree to store elements that hash to the same value. In this case
>> >> insert/search will be O(log(N)) per element and O(Nlog(N)) per
>> >> N-elements. Of course, RB-tree has its own overhead. Therefore, one
>> >> can try an adaptive approach. Whenever a certain bucket becomes too
>> >> large k>Kmax convert it to RB tree, else keep it a list. No
>> >> randomization required.
>> >
>> > Once you start merging trees and hash tables, you leave the realms of
>> > what I'd traditionally call a hash table, but I concede your point;
>> > adaptive choice of hybrid algorithms certainly seems like a nicer
>> > solution than randomisation.
>>
>> It is still a hash table but the collision resolution (which is
>> conventionally done with chaining (a list) within a bucket of
>> colliding hashes)
>> is replaced with a RB-tree only if the bucket size exceeds a threshold
>> K_tree. It is converted back to a list if the bucket size drops below
>> another threshold K_list (which is smaller than K_tree). Having two
>> distinct threshold values enables hysteresis behavior to prevent
>> ping-pong effect.
>>
>> Most of the time a list based implementation is in effect (as it is
>> now in Lua). It is only during an attack the RB-tree machinery will
>> come into play. And O(log(N)) insert/search/delete per element does
>>
>> --Leo--
>>
>
> Another solution to this, as discussed on #lua, is to use cuckoo hashing
> (http://en.wikipedia.org/wiki/Cuckoo_hashing) which keeps things purely
> hash-based (or as close to as to make no odds) while maintaining the O(1)
> characteristics that make hashing so attractive in the first place.
>

Cuckoo hashing is, indeed, interesting. However, there are still open
issues with guaranteeing its worst case performance, c.f.
http://www.eecs.harvard.edu/~michaelm/postscripts/esa2009.pdf

--Leo--

```