I've been mucking about with weak tables recently, and noticed my implementation of a memoization function had a fairly significant memory leak. Assuming I had made a mistake somewhere that prevented garbage collection, I simplified it to the following code:
local function memoize(a_function)
local cache = setmetatable({}, {__mode = 'v'})
local result_of = setmetatable({}, {__mode = 'k'})
return function(arg)
local cache_node_of_arg = cache[arg]
if not cache_node_of_arg then
cache_node_of_arg = {}
cache[arg] = cache_node_of_arg
end
local result = result_of[cache_node_of_arg]
if result == nil then
result = a_function(arg)
result_of[cache_node_of_arg] = result
end
return result
end
end
The extra layer of indirection (result_of[cache[arg]] instead of cache[arg]) is important because, in the original (non-simplified) function, it is needed to support functions that accept multiple arguments.
The expected behavior is that this structure leads to little or no increase in memory usage after garbage collection (not including any subsequently used results). What actually happened was that, when a memoized function was called many times, memory usage consistently increased, sometimes resulting in 'not enough memory' panics. A large part of the memory usage increase persisted for the lifetime of the memoized function despite garbage collection, or in the case of LuaJIT, the lifetime of the running program.
What is really odd is that this does not happen if collectgarbage() is called soon after calling a memoized function. To demonstrate, the following code calls memoize on a function and prints memory usage before and after a large number of calls.
local function gc_print(str) io.stdout:write(str, collectgarbage'count'*1024,'\n'); io.stdout:flush() end
local n = 1000000
print('first run: '..n..' objects created')
local f = memoize(function(x) return {} end)
collectgarbage()
gc_print('before creation loop:\t')
for i = 1, n do f(i) end
gc_print('after loop:\t\t')
collectgarbage()
gc_print('after collection:\t')
f = nil
collectgarbage()
gc_print('after f = nil:\t\t')
print()
print('second run: '..n..' objects created, calling collectgarbage() after each')
f = memoize(function(x) return {} end)
collectgarbage()
gc_print('before creation loop:\t')
for i = 1, n do f(i); collectgarbage() end
collectgarbage()
gc_print('after loop:\t\t')
collectgarbage()
gc_print('after collection:\t')
f = nil
collectgarbage()
gc_print('after f = nil:\t\t')
print()