lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


There are however good strategies that are intended to limit the
volume of rehashing:
For example, instead of using a single hash table with a fixed size,
you may want to use a set of subtables with different sizes:
- Each subtable will used in the same order a which they were created.
They may have non-necessarily growing sizes. Each one has its own fill
factor thresholds (min and max) defined according to their allocated
size.
- Each subtable may be of one of two types: sequential (not storing
keys when they are integers in a contiguous range), or hashed. Integer
keys will be looked up in sequential subtables (if they are in their
range), otherwise in the hashed subtables; all other keys (including
integer keys not matching the ranges for sequential subtables) will be
only in one of the hashed subtables (that will store the keys and
values and not just the values).
- When one of the hashed subtables exceeds its maximum threshold
(meaning it has too long collision lists), you may want to rehash a
part of its keys into one of the other subtables: first try relocating
the value into one of the sequential tables, if their range fit for an
an integer key, else relocate the (key,value) pair in one of the other
hashed subtables that does not exceed its threshold. If all subtables
have reched their maximum threshold, you'll need to merge the two
smallest tables into a larger one to make place for a new hashed
table.

When unsetting a key's value to null, its previous slot in one of the
subtables may become null:
- if that slot was in a sequential subtable, that subtables may become
larger than needed so that its minimum threshold will be exceeded: you
may want to truncate that table if there are enough null slots at
start or end of the sequential subtable to reduce the range of integer
keys it represents (the minimum integer key value is storable, the
maximum integer key value is implied for the minimum and the subtable
size). Otherwise you could split the table in two parts, merging one
part into another sequential subtable which could fit some keys in
their range, or putting the remaining isolated elements into the
hashed subtables, before reducing the sequential subtable size.
- if that slot was in a hashed subtable, that subtable may also have
too many free slots for its minimum fill factor threshold, and here
again you may want to determine if its keys should be merged into the
other sequential subtables (if they fit the key) or other hashed
subtables (if their current capacity allows it without exceeding its
maximum threshold), or if another hashed subtable should be merged
into it. Which subtable you'll merge to another should be based on the
number of keys they currently store. This can then allow to free one
of the hashed subtables.

All in all, the work needed for reshing will be more limited, the
tables will autoadjust themselves to optimize the sequences (including
for the current worst case where a table is filled sequentially but in
backward order, where almost everything goes to the hashed subtable).
Fill factor thresholds in each subtables will help maintain the
rempresentation compact, and the chains for key lookup will be
minimized.