[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Lua table implementation
- From: Wim Couwenberg <w.couwenberg@...>
- Date: Tue, 03 Oct 2006 20:11:49 +0200
FYI,
I had to write a hash-table implementation for a project
recently and I re-implemented some of the idea's in the
Lua's hash-table. I can share it with you if you want.
I too have written a generic hash table implementation that I use in my
C projects. It differs from Lua's hash table in one respect, with some
nice consequences. To explain, here is the structure of an entry (in
pseudo speak):
struct hash_entry
{
struct hash_entry *root; /* points to the root of the chain */
struct hash_entry *next; /* chains hash identical items */
data_type data; /* entry data */
};
A hash table is now essentially an array of entries (capacity a power of
two for easy hash folding).
An item is located as follows (pseudo code):
/* compute index of root pointer */
uint index = hash(item) % table->capacity;
/* locate root of chain */
hash_entry *entry = table->entries[index].root;
/* search chain for item */
while (entry && !equals(item, entry->data))
entry = entry->next;
When filling an empty hash table, data items are simply put at positions
0, 1, 2, ... in the entries array while their hash index is used to set
(or chase) the proper root pointer. (When an entry is removed, its slot
is added to a free list.) When resizing a table the data items are
simply re-inserted in order 0, 1, 2, ... (if these slots contain data).
This has three nice effects:
1. There is never a hash collision: Either the root at some hash index
is set, in which case it points to a chain of relevant hash identical
items, or it is NULL in which case no items with that hash are present yet.
2. You never have to move an entry (since there are no collisions).
3. Since a rehash doesn't change the index ordering of data in the
entries array, the "next" function can be implemented such that it
consistently enumerates items even after a rehash. That is, all items
are enumerated exactly once, except for items that where inserted during
traversal, which may or may not be enumerated.
The cost is an extra root pointer per item. I won't comment on how
shiny this is... ;-) It works marvelously. Hash tables rock. :-D
--
Wim