lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Am 03.01.2014 04:57 schröbte Christopher VanZomeren:
I've been mucking about with weak tables recently, and noticed my
implementation of a memoization function had a fairly significant memory
leak. Assuming I had made a mistake somewhere that prevented garbage
collection, I simplified it to the following code:

The memory problem is related to these[1][2] posts.


local function memoize(a_function)

     local cache = setmetatable({}, {__mode = 'v'})
     local result_of = setmetatable({}, {__mode = 'k'})

     return function(arg)

         local cache_node_of_arg = cache[arg]

         if not cache_node_of_arg then
             cache_node_of_arg = {}

               -- here cache_node_of_arg is the only (strong)
               -- reference to the newly created table

             cache[arg] = cache_node_of_arg

               -- cache is weak-valued, so cache_node_of_arg is
               -- still the only strong reference to the table

         end

         local result = result_of[cache_node_of_arg]

         if result == nil then
             result = a_function(arg)
             result_of[cache_node_of_arg] = result

               -- result_of is weak-keyed, so cache_node_of_arg is
               -- *still* the only strong reference to the table

         end

         return result
     end

       -- and here cache_node_of_arg is out of scope, so the table
       -- above is collected and removed from the weak tables on the
       -- next garbage collection run. Probably not what you wanted ...

end

The extra layer of indirection (result_of[cache[arg]] instead of
cache[arg]) is important because, in the original (non-simplified)
function, it is needed to support functions that accept multiple arguments.

The expected behavior is that this structure leads to little or no increase
in memory usage after garbage collection (not including any subsequently
used results). What actually happened was that, when a memoized function
was called many times, memory usage consistently increased, sometimes
resulting in 'not enough memory' panics. A large part of the memory usage
increase persisted for the lifetime of the memoized function despite
garbage collection, or in the case of LuaJIT, the lifetime of the running
program.

What you see is the memory allocated by the weak tables. If you add elements to them (even if they will be collected in the next gc cycle) the tables have to grow to store those elements.


What is really odd is that this does not happen if collectgarbage() is
called soon after calling a memoized function.

If you call `collectgarbage()` immediately after insertion the elements in the weak tables are removed, making place for the next elements without resizing the tables.


Here is the output for Lua 5.2 with 1 million objects:

first run: 1000000 objects created
before creation loop:    26932
after loop:        29579892
after collection:    20998452

Those 20 Mb are reserved by the two empty (weak) tables. For comparison: One empty table able to hold one million key-value-pairs is 40 Mb. You don't have 80 Mb here because the tables got cleared by the gc once in a while.

after f = nil:        26388

second run: 1000000 objects created, calling collectgarbage() after each
before creation loop:    26932
after loop:        27012

The tables never need to grow, because `collectgarbage()` immediately removes all elements from them.

after collection:    27012
after f = nil:        26388


Philipp

  [1]: http://permalink.gmane.org/gmane.comp.lang.lua.general/104754
  [2]: http://permalink.gmane.org/gmane.comp.lang.lua.general/104916