lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I've been mucking about with weak tables recently, and noticed my implementation of a memoization function had a fairly significant memory leak. Assuming I had made a mistake somewhere that prevented garbage collection, I simplified it to the following code:

local function memoize(a_function)

    local cache = setmetatable({}, {__mode = 'v'})
    local result_of = setmetatable({}, {__mode = 'k'})
   
    return function(arg)
   
        local cache_node_of_arg = cache[arg]
       
        if not cache_node_of_arg then
            cache_node_of_arg = {}
            cache[arg] = cache_node_of_arg
        end
       
        local result = result_of[cache_node_of_arg]
       
        if result == nil then
            result = a_function(arg)
            result_of[cache_node_of_arg] = result
        end

        return result
    end
end

The extra layer of indirection (result_of[cache[arg]] instead of cache[arg]) is important because, in the original (non-simplified) function, it is needed to support functions that accept multiple arguments.

The expected behavior is that this structure leads to little or no increase in memory usage after garbage collection (not including any subsequently used results). What actually happened was that, when a memoized function was called many times, memory usage consistently increased, sometimes resulting in 'not enough memory' panics. A large part of the memory usage increase persisted for the lifetime of the memoized function despite garbage collection, or in the case of LuaJIT, the lifetime of the running program.

What is really odd is that this does not happen if collectgarbage() is called soon after calling a memoized function. To demonstrate, the following code calls memoize on a function and prints memory usage before and after a large number of calls.

local function gc_print(str) io.stdout:write(str, collectgarbage'count'*1024,'\n'); io.stdout:flush() end

local n = 1000000

print('first run: '..n..' objects created')
local f = memoize(function(x) return {} end)

collectgarbage()
gc_print('before creation loop:\t')

for i = 1, n do f(i) end
gc_print('after loop:\t\t')

collectgarbage()
gc_print('after collection:\t')

f = nil
collectgarbage()
gc_print('after f = nil:\t\t')

print()

print('second run: '..n..' objects created, calling collectgarbage() after each')
f = memoize(function(x) return {} end)

collectgarbage()
gc_print('before creation loop:\t')

for i = 1, n do f(i); collectgarbage() end
collectgarbage()
gc_print('after loop:\t\t')

collectgarbage()
gc_print('after collection:\t')

f = nil
collectgarbage()
gc_print('after f = nil:\t\t')

print()

Here is the output for Lua 5.2 with 1 million objects:

first run: 1000000 objects created
before creation loop:    26932
after loop:        29579892
after collection:    20998452
after f = nil:        26388

second run: 1000000 objects created, calling collectgarbage() after each
before creation loop:    26932
after loop:        27012
after collection:    27012
after f = nil:        26388

The same test on Lua 5.1 returns similar results

Here it is for LuaJIT, though with only 100 000 because the repeated calls to collectgarbage() in the second loop take significantly longer:

first run: 100000 objects created
before creation loop:    29700
after loop:        3057975
after collection:    1726615
after f = nil:        1214295

second run: 100000 objects created, calling collectgarbage() after each
before creation loop:    1214639
after loop:        1215659
after collection:    1215659
after f = nil:        1215659

Calling collectgarbage() after every call to this specific function is obviously not a good solution because it can introduce massive overhead and defeats the point of memoizing a function in the first place by destroying the cache.

I'm pretty sure this memory leak is not my fault, but I hope I'm wrong and did something silly/misguided/stupid.