lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi,

A few days ago, I was trying to understand why, in some cases, the memory used by several of my test programs was growing forever, except if I called "collectgarbage()" regularly to force full garbage collection cycles. After some reading (esp. PIL 3rd edition  last chapter) and digging in the source code, I could get some ideas of what was the cause of the problem and I could fix it mostly by setting Lua's GC parameters "pause" and "stepmul" to more aggressive values than the default ones. 

This message is a brief summary of these investigations, as it turns out that there not so many posts in the list about the configuration of the GC, so I hope it could be of some help if others have similar issues. It is also usefull to give feedback to the Lua team on the usage and behavior of the GC (the call for feedback on the generational garbage collection a year ago didn't have many answers ;-).

But first : the context. 
Running with Lua 5.2.2 and a Lua to Objective C bridge, these programs do a heavy use of  small userdata - sizeof(void*) - that contain pointers to external ObjC objects and keep them alive until the corresponding finalizer is called by the Lua VM. This means that most of the memory allocated to the program is not visible from the VM but this memory's lifecycle is however directly controlled by the VM through the operation of the GC. 

In this configuration it seemed that the garbage cycles never went to completion until I increased GCSTEPMUL to 800 and lowered GC_PAUSE to 125 (these are now the default in my code btw). 

Even then the finalizers didn't seem to be called incrementally during the GC steps, but rather were (almost) all called at the end of the GC cycle. 
So the external memory could be released only at unfrequent points of time, and with large objects like images, this could be an issue.

Putting a conditional breakpoint at the end of lgc.c function luaC_forcestep  for displaying the number of finalizers called at each step (when non-zero) gave patterns like:
2013-10-20 15:46:57.565 Lua called finalizers of 4 objects
2013-10-20 15:46:57.915 Lua called finalizers of 4 objects
2013-10-20 15:46:58.871 Lua called finalizers of 4 objects
2013-10-20 15:46:59.933 Lua called finalizers of 4 objects
2013-10-20 15:47:00.600 Lua called finalizers of 4 objects
2013-10-20 15:47:01.043 Lua called finalizers of 4 objects
2013-10-20 15:47:01.987 Lua called finalizers of 4 objects
2013-10-20 15:47:02.682 Lua called finalizers of 4 objects
2013-10-20 15:47:02.930 Lua called finalizers of 4 objects
2013-10-20 15:47:04.021 Lua called finalizers of 4 objects
2013-10-20 15:47:04.708 Lua called finalizers of 4 objects
2013-10-20 15:47:06.703 Lua called finalizers of 4 objects
2013-10-20 15:47:08.508 Lua called finalizers of 4 objects
2013-10-20 15:47:09.204 Lua called finalizers of 4 objects
2013-10-20 15:47:10.099 Lua called finalizers of 4 objects
2013-10-20 15:47:11.104 Lua called finalizers of 4 objects
2013-10-20 15:47:11.827 Lua called finalizers of 1789 objects

So a small number of finalizers called at the end of each step (at most 4), and the rest of then called then the "pause" state is reached.

I understand that the VM doesn't have any hint of the cost of a finalizer call, hence the limitation to 4. But this should at least logically grow with the stepmul value.

So, by replacing in luaC_forcestep:

  for (i = 0; g->tobefnz && (i < GCFINALIZENUM || g->gcstate == GCSpause); i++)
    GCTM(L, 1);  /* call one finalizer */

with:

  int maxfinalizenum = (GCFINALIZENUM * g->gcstepmul) / STEPMULADJ;
  for (i = 0; g->tobefnz && (i < maxfinalizenum || g->gcstate == GCSpause); i++)
    GCTM(L, 1);  /* call one finalizer */

the calls of finalizers become slightly better balanced.
E.g. with stepmul = 800, a typical pattern now is:
2013-10-22 15:08:52.996 Lua called finalizers of 16 objects
2013-10-22 15:08:53.136 Lua called finalizers of 16 objects
2013-10-22 15:08:53.276 Lua called finalizers of 16 objects
2013-10-22 15:08:54.514 Lua called finalizers of 16 objects
2013-10-22 15:08:54.739 Lua called finalizers of 16 objects
2013-10-22 15:08:54.922 Lua called finalizers of 16 objects
2013-10-22 15:08:55.066 Lua called finalizers of 16 objects
2013-10-22 15:08:55.332 Lua called finalizers of 16 objects
2013-10-22 15:08:55.667 Lua called finalizers of 16 objects
2013-10-22 15:08:55.815 Lua called finalizers of 16 objects
2013-10-22 15:08:55.962 Lua called finalizers of 16 objects
2013-10-22 15:08:56.108 Lua called finalizers of 16 objects
2013-10-22 15:08:56.250 Lua called finalizers of 16 objects
2013-10-22 15:08:56.428 Lua called finalizers of 16 objects
2013-10-22 15:08:56.601 Lua called finalizers of 319 objects

We could even go further and define an extra GC parameter to indicate an estimate of the finalizer cost.
In effect, in the Lua to XXX bridge case, the finalizer will usually be a C function with a very small overhead. For example, for the Lua ObjC bridge, the finalizer is merely a call to the "release" method of the referenced object, with a very low cost. By setting an estimated finalizer cost the program could have control over the number of finalizers called at each GC step and better balance when external memory allocated to the program is released.

What do you think of this? Could it be added in a future version of Lua?

All other feedback on this topic is of course welcome. :-)

Jean-Luc