lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 30 December 2011 23:04, Leo Razoumov <slonik.az@gmail.com> wrote:
On Fri, Dec 30, 2011 at 09:45, Peter Cawley <lua@corsix.org> wrote:
> On Fri, Dec 30, 2011 at 2:31 PM, Leo Razoumov <slonik.az@gmail.com> wrote:
>> On Thu, Dec 29, 2011 at 13:58, Peter Cawley <lua@corsix.org> wrote:
>>>
>>> The fundamental messages here are:
>>> 1) Inserting N elements into a hash table insertion is O(N^2) worst
>>> case (regardless of what particular hash table implementation is being
>>> used).
>>
>> Actually, it is not true. You can use self-balancing trees like
>> RB-tree to store elements that hash to the same value. In this case
>> insert/search will be O(log(N)) per element and O(Nlog(N)) per
>> N-elements. Of course, RB-tree has its own overhead. Therefore, one
>> can try an adaptive approach. Whenever a certain bucket becomes too
>> large k>Kmax convert it to RB tree, else keep it a list. No
>> randomization required.
>
> Once you start merging trees and hash tables, you leave the realms of
> what I'd traditionally call a hash table, but I concede your point;
> adaptive choice of hybrid algorithms certainly seems like a nicer
> solution than randomisation.

It is still a hash table but the collision resolution (which is
conventionally done with chaining (a list) within a bucket of
colliding hashes)
is replaced with a RB-tree only if the bucket size exceeds a threshold
K_tree. It is converted back to a list if the bucket size drops below
another threshold K_list (which is smaller than K_tree). Having two
distinct threshold values enables hysteresis behavior to prevent
ping-pong effect.

Most of the time a list based implementation is in effect (as it is
now in Lua). It is only during an attack the RB-tree machinery will
come into play. And O(log(N)) insert/search/delete per element does
not sound too bad.

--Leo--


Another solution to this, as discussed on #lua, is to use cuckoo hashing (http://en.wikipedia.org/wiki/Cuckoo_hashing) which keeps things purely hash-based (or as close to as to make no odds) while maintaining the O(1) characteristics that make hashing so attractive in the first place.

--
"Perhaps people don't believe this, but throughout all of the discussions of entering China our focus has really been what's best for the Chinese people. It's not been about our revenue or profit or whatnot."
--Sergey Brin, demonstrating the emptiness of the "don't be evil" mantra.