|
On 2013-10-22 10:55 AM, "Jean-Luc Jumpertz" <jean-luc@celedev.eu> wrote:
>
> Hi,
>
> A few days ago, I was trying to understand why, in some cases, the memory used by several of my test programs was growing forever, except if I called "collectgarbage()" regularly to force full garbage collection cycles. After some reading (esp. PIL 3rd edition last chapter) and digging in the source code, I could get some ideas of what was the cause of the problem and I could fix it mostly by setting Lua's GC parameters "pause" and "stepmul" to more aggressive values than the default ones.
>
> This message is a brief summary of these investigations, as it turns out that there not so many posts in the list about the configuration of the GC, so I hope it could be of some help if others have similar issues. It is also usefull to give feedback to the Lua team on the usage and behavior of the GC (the call for feedback on the generational garbage collection a year ago didn't have many answers ;-).
>
> But first : the context.
> Running with Lua 5.2.2 and a Lua to Objective C bridge, these programs do a heavy use of small userdata - sizeof(void*) - that contain pointers to external ObjC objects and keep them alive until the corresponding finalizer is called by the Lua VM. This means that most of the memory allocated to the program is not visible from the VM but this memory's lifecycle is however directly controlled by the VM through the operation of the GC.
>
> In this configuration it seemed that the garbage cycles never went to completion until I increased GCSTEPMUL to 800 and lowered GC_PAUSE to 125 (these are now the default in my code btw).
>
> Even then the finalizers didn't seem to be called incrementally during the GC steps, but rather were (almost) all called at the end of the GC cycle.
> So the external memory could be released only at unfrequent points of time, and with large objects like images, this could be an issue.
>
> Putting a conditional breakpoint at the end of lgc.c function luaC_forcestep for displaying the number of finalizers called at each step (when non-zero) gave patterns like:
> 2013-10-20 15:46:57.565 Lua called finalizers of 4 objects
> 2013-10-20 15:46:57.915 Lua called finalizers of 4 objects
> 2013-10-20 15:46:58.871 Lua called finalizers of 4 objects
> 2013-10-20 15:46:59.933 Lua called finalizers of 4 objects
> 2013-10-20 15:47:00.600 Lua called finalizers of 4 objects
> 2013-10-20 15:47:01.043 Lua called finalizers of 4 objects
> 2013-10-20 15:47:01.987 Lua called finalizers of 4 objects
> 2013-10-20 15:47:02.682 Lua called finalizers of 4 objects
> 2013-10-20 15:47:02.930 Lua called finalizers of 4 objects
> 2013-10-20 15:47:04.021 Lua called finalizers of 4 objects
> 2013-10-20 15:47:04.708 Lua called finalizers of 4 objects
> 2013-10-20 15:47:06.703 Lua called finalizers of 4 objects
> 2013-10-20 15:47:08.508 Lua called finalizers of 4 objects
> 2013-10-20 15:47:09.204 Lua called finalizers of 4 objects
> 2013-10-20 15:47:10.099 Lua called finalizers of 4 objects
> 2013-10-20 15:47:11.104 Lua called finalizers of 4 objects
> 2013-10-20 15:47:11.827 Lua called finalizers of 1789 objects
>
> So a small number of finalizers called at the end of each step (at most 4), and the rest of then called then the "pause" state is reached.
>
> I understand that the VM doesn't have any hint of the cost of a finalizer call, hence the limitation to 4. But this should at least logically grow with the stepmul value.
>
> So, by replacing in luaC_forcestep:
>
> for (i = 0; g->tobefnz && (i < GCFINALIZENUM || g->gcstate == GCSpause); i++)
> GCTM(L, 1); /* call one finalizer */
>
> with:
>
> int maxfinalizenum = (GCFINALIZENUM * g->gcstepmul) / STEPMULADJ;
> for (i = 0; g->tobefnz && (i < maxfinalizenum || g->gcstate == GCSpause); i++)
> GCTM(L, 1); /* call one finalizer */
>
> the calls of finalizers become slightly better balanced.
> E.g. with stepmul = 800, a typical pattern now is:
> 2013-10-22 15:08:52.996 Lua called finalizers of 16 objects
> 2013-10-22 15:08:53.136 Lua called finalizers of 16 objects
> 2013-10-22 15:08:53.276 Lua called finalizers of 16 objects
> 2013-10-22 15:08:54.514 Lua called finalizers of 16 objects
> 2013-10-22 15:08:54.739 Lua called finalizers of 16 objects
> 2013-10-22 15:08:54.922 Lua called finalizers of 16 objects
> 2013-10-22 15:08:55.066 Lua called finalizers of 16 objects
> 2013-10-22 15:08:55.332 Lua called finalizers of 16 objects
> 2013-10-22 15:08:55.667 Lua called finalizers of 16 objects
> 2013-10-22 15:08:55.815 Lua called finalizers of 16 objects
> 2013-10-22 15:08:55.962 Lua called finalizers of 16 objects
> 2013-10-22 15:08:56.108 Lua called finalizers of 16 objects
> 2013-10-22 15:08:56.250 Lua called finalizers of 16 objects
> 2013-10-22 15:08:56.428 Lua called finalizers of 16 objects
> 2013-10-22 15:08:56.601 Lua called finalizers of 319 objects
>
> We could even go further and define an extra GC parameter to indicate an estimate of the finalizer cost.
> In effect, in the Lua to XXX bridge case, the finalizer will usually be a C function with a very small overhead. For example, for the Lua ObjC bridge, the finalizer is merely a call to the "release" method of the referenced object, with a very low cost. By setting an estimated finalizer cost the program could have control over the number of finalizers called at each GC step and better balance when external memory allocated to the program is released.
>
> What do you think of this? Could it be added in a future version of Lua?
>
> All other feedback on this topic is of course welcome. :-)
>
> Jean-Luc
>
>
>
>
I've heard a few proposals before that the Lua GC could stand to have some way to tell it that a userdata has an additional n bytes associated with it, beyond the size of the userdata itself, so that it can better manage large objects. It seems like a good idea to me, as often a full userdata contains only a pointer to a large object allocated by C code, so Lua sees only 4-8 bytes used by something that's actually much larger. Did that idea ever go anywhere?