lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


G'day everybody.

This is a naive question which has doubtless been answered many times. Recent postings about table.sort() imply but don't explain the answer.

In code in LuaSocket and COPAS, you can find a utility "set" class such as this:

-- simple set implementation
-- the select function doesn't care about what is passed to it as long as
-- it behaves like a table
-- creates a new set data structure
function newset()
    local reverse = {}
    local set = {}
    return setmetatable(set, {__index = {
        insert = function(set, value)
            if not reverse[value] then
                table.insert(set, value)
                reverse[value] = table.getn(set)
            end
        end,
        remove = function(set, value)
            local index = reverse[value]
            if index then
                reverse[value] = nil
                local top = table.remove(set)
                if top ~= value then
                    reverse[top] = index
                    set[index] = top
                end
            end
        end
    }})
end

This code is very cool, and the "add at end" and "delete by removing the last element and writing it over the one you want to get rid of" is clearly more time efficient than just using set[value] = true to add and set[value] = nil to delete items.

My question is, how much more efficient (compared to the overhead of using two tables where one is enough)? If I add items with random indices to a table and then randomly remove them, where is the inefficiency? Is it in the garbage collection of the removed items? Is it a raft of O(n) memcpy()s when compacting metadata (e.g. hashes) for tables which have scattered bunches of removed items?

Where does the time go in starting with an empty (non-array) table and adding and removing entries arbitrarily, where the the adds and removes are paired, but only in the long run. (E.g. you might add 50 items, then remove a random 10 of them, then add 16 more, then remove 30 of all that are left, then add 6, then remove the lot, then add 66, rinse, repeat for a long time.)

(Apologies if this is answered in detail in PiL, which it probably is. If anyone has hard measurements of the differences in behaviour I'd be most interested. Any OS at all from the set {Windows, Linux} :-)