[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Utility functions to manage sets
- From: Duck <duck@...>
- Date: Tue, 3 Jul 2007 10:59:20 +1000 (EST)
G'day everybody.
This is a naive question which has doubtless been answered many times.
Recent postings about table.sort() imply but don't explain the answer.
In code in LuaSocket and COPAS, you can find a utility "set" class such as
this:
-- simple set implementation
-- the select function doesn't care about what is passed to it as long as
-- it behaves like a table
-- creates a new set data structure
function newset()
local reverse = {}
local set = {}
return setmetatable(set, {__index = {
insert = function(set, value)
if not reverse[value] then
table.insert(set, value)
reverse[value] = table.getn(set)
end
end,
remove = function(set, value)
local index = reverse[value]
if index then
reverse[value] = nil
local top = table.remove(set)
if top ~= value then
reverse[top] = index
set[index] = top
end
end
end
}})
end
This code is very cool, and the "add at end" and "delete by removing the
last element and writing it over the one you want to get rid of" is
clearly more time efficient than just using set[value] = true to add and
set[value] = nil to delete items.
My question is, how much more efficient (compared to the overhead of using
two tables where one is enough)? If I add items with random indices to a
table and then randomly remove them, where is the inefficiency? Is it in
the garbage collection of the removed items? Is it a raft of O(n)
memcpy()s when compacting metadata (e.g. hashes) for tables which have
scattered bunches of removed items?
Where does the time go in starting with an empty (non-array) table and
adding and removing entries arbitrarily, where the the adds and removes
are paired, but only in the long run. (E.g. you might add 50 items, then
remove a random 10 of them, then add 16 more, then remove 30 of all that
are left, then add 6, then remove the lot, then add 66, rinse, repeat
for a long time.)
(Apologies if this is answered in detail in PiL, which it probably is. If
anyone has hard measurements of the differences in behaviour I'd be most
interested. Any OS at all from the set {Windows, Linux} :-)