lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



FYI,

I had to write a hash-table implementation for a project
recently and I re-implemented some of the idea's in the
Lua's hash-table. I can share it with you if you want.

I too have written a generic hash table implementation that I use in my C projects. It differs from Lua's hash table in one respect, with some nice consequences. To explain, here is the structure of an entry (in pseudo speak):

struct hash_entry
{
  struct hash_entry *root; /* points to the root of the chain */
  struct hash_entry *next; /* chains hash identical items */
  data_type data; /* entry data */
};

A hash table is now essentially an array of entries (capacity a power of two for easy hash folding).

An item is located as follows (pseudo code):

  /* compute index of root pointer */
  uint index = hash(item) % table->capacity;

  /* locate root of chain */
  hash_entry *entry = table->entries[index].root;

  /* search chain for item */
  while (entry && !equals(item, entry->data))
    entry = entry->next;

When filling an empty hash table, data items are simply put at positions 0, 1, 2, ... in the entries array while their hash index is used to set (or chase) the proper root pointer. (When an entry is removed, its slot is added to a free list.) When resizing a table the data items are simply re-inserted in order 0, 1, 2, ... (if these slots contain data). This has three nice effects:

1. There is never a hash collision: Either the root at some hash index is set, in which case it points to a chain of relevant hash identical items, or it is NULL in which case no items with that hash are present yet.

2. You never have to move an entry (since there are no collisions).

3. Since a rehash doesn't change the index ordering of data in the entries array, the "next" function can be implemented such that it consistently enumerates items even after a rehash. That is, all items are enumerated exactly once, except for items that where inserted during traversal, which may or may not be enumerated.


The cost is an extra root pointer per item. I won't comment on how shiny this is... ;-) It works marvelously. Hash tables rock. :-D

--
Wim