lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


So If I uynderstand well, the metatable or table is not changed, what is changed is only the presence of the object in the "list" of objects to be finalized, which is filled at start of the mark phase with all known objects, then removed from the list when they are reached from the stack and marked as reachable

At end of the mark phase it remains a list of of unreachable objects that will ned to be finalized; then the finalization step starts which takes each object from the list and removes it, then it calls the finalizer if there's one; but is there any action in the finalizer that determines that the object will then be sweeped?

The ONLY action I see is the fact that it calls setmetatable(); you are saying that this does NOT change the metatable, strange!

But it must also make something else and will then mark the object to be not sweeped; however the call to setmetatable is not the end of the finalizer which has still not returned to the GC sweeper; the finalizer may still change the state of the metatable *after* calling setmetatable(), so it could still set or remove its "__gc" entry. And there will nothing else happening before the finalizer returns, so there will be nothing that can actually set the required bit/flag property in the object itself properly. Let's suppose that the GC then inspects the metatable at end to see if there's a __gc entry mapped to a function: how can it determine that the function called setmetatable() or changed the entry in its metatable and differentiate it from the action of a finalizer that did nothing at all? There must be an action taken by the finalizer to effectively indicate to the GC that the object must not be sweeped and marked for later finalization.

The finalizer may also resurrect that object by linking it to another "live" object (i.e. a reachable object that has already been marked) and also will not call any setmetatable, but it can also stil lset or reset the __gc entry of its existing metatable.

All we know is that an object has a "state" which is active but still not marked (possible only at start or during the marking phase, impossible during the swep phase), active but marked, dead to finalize, finalized to sweep, or resurrected (to be made active but still not marked again at end of the sweep phase). This state is not enough to determine if what a finalizer does (or does not do) will cause the object to be swept or to be finalized again later.

The only reliable info is that, just before calling the finalizer, the GC will clear the link of the object to its metatable: it is then up to the finalizer to reattach the metatable by calling setmetatable with a suitable __gc entry attached to a finalizer function (not necessarily the same function as the current finalizer itself). If there's no such call to set metatable, or if the finalizer clears the __gc entry or sets it to a non-function, and if the object has not been resurrected by the finalizer by linking it to a object with a "marked" status or an object with a "dead to finalize" status (processed later in the same sweep cycle, then the object will be swept by the GC just after the finalizer has returned.

That's what is not clearly documented: what is the effective status of the object which differentiates an object being finalized to indicate to the GC that it must not be swept after calling the finalizer? There must be an action taken by the finalizer itself, but by default if this action is not taken by the finalizer, then the finalization will be immediately followed by sweeping.

And I only see the fact for a finalizer of calling setmetatable() to set or restore the metatable which was detached from the object by the GC just before calling the finalizer, simply by clearing the internal pointer to the object's metatable, so when the finalizer will call setmetatable() to set it to a non-nil value, this will have the desired effect of indicating to the GC that the object must not be finalized

E.g.:
- a TCP network session socket that has been closed but is still kept for about one minute in FIN_WAIT state, during which that socket may still be resurrected, in order to reuse its allocated port number and allow fast restart with its existing reception/transmission windows and MTU: this can be useful for security against DOS attacks to avoid a server to eat all its port number resources, but also for privacy reason to secure all sessions
- another usage is to allow closed files to have some delays before they get flushed physically, or because the flush itself may be long and may need to be tested and retried several times, before abandoning and logging some severe errors to inform the user or the program itself that something bad happened aynchronously without forcing the close() to be blocking until flushing is fully completed.
- another usage may be to delay the power down of a previously used device (e.g. turning off a screen display after several minutes when there was no longer any new message to display), because turning on the device may be very lengthy if it was turned off immediately after a close).
- another usage may be to unallocate other OS or external resources (e.g. returning local memory used by Lua to the OS, by forcing all "weak" objects to be deallocated, including for example caches, or deleting caches stored in the filesystem that have expired a "grace delay" where they can still be reused)
- another usage would be to start a reorganization/optimization/defragmentation of the storage, or physicallly storage entries that are no longer in use: this could be I/O intensive on large volumes, and such clearing will be done after a grace period, where it will be more easily performed with lower impact by performing it sequentially instead of in random order on disk)
Basically finalizers are there to delay operations that can be postoned without blocking the program that no longer needs immediately an object. It still allows a program to reconstruct the object (notably weak" objects for caches much faster if the underlying structures were not cleared and their finalization was delayed for a grace period.

What you quoite explains is just that there are lists of objects from which candidates are extracted, but it still does not indicate clearly which action a finalizer takes to effectively change the state of the object so that the GC will not sweep it when the finalizer will return. The GC must then have already modified the state of the object (to indicate that it MUST be swept) just before calling the finalizer and the finalizer takes an optional decision to change again that state and indicate that now it MUST NOT be swept by the GC: te finalizer itself cannot change the various lists of objects maintained only by the GC itself, it cannot change its "generation" models if generations are used in Lua 5.4 to subdivide the lists of objects in smaller subsets, where GC and finalization will be faster on live objects than objects in older generations that have survived more than 1 cycle and are less likely of not needing to be swept rapidly).


Le dim. 18 nov. 2018 à 23:27, nobody <nobody+lua-list@afra-berlin.de> a écrit :
On 18/11/2018 15.34, Philippe Verdy wrote:
> It's not very well documented, but when a finalizer gets called on an
> object, just before calling it, the GC first clears the associated
> metatable if the object being finalized is a table: in the finalizer
> for an object whose type is 'table' or 'userdata', if you use
> getmetatable(self), it's not documented clearly if either you'll get
> nil, or you'll get the same metatable whose "__gc" entry is now nill,
> something that should be better, allowing you to store the "cnt"
> variable inside the metatable itself along with the "__gc" variable,
> instead of the object being finalized).

That's complete nonsense.  Any modification of the metatable would be
unsafe as these are commonly used on several objects (though not in this
example), so the collection / finalization of the first such object
would break the finalization of all other objects with the same shared
metatable.

See §2 of https://www.lua.org/manual/5.3/manual.html#2.5.1 which says:

> For an object (table or userdata) to be finalized when collected, you
> must mark it for finalization. You mark an object for finalization
> when you set its metatable and the metatable has a field indexed by
> the string `"__gc"`. Note that if you set a metatable without a
> `__gc` field and later create that field in the metatable, the object
> will not be marked for finalization.

And §3

> When a marked object becomes garbage, it is not collected immediately
> by the garbage collector. Instead, Lua puts it in a list. After the
> collection, Lua goes through that list. For each object in the list,
> it checks the object's __gc metamethod: If it is a function, Lua
> calls it with the object as its single argument; if the metamethod is
> not a function, Lua simply ignores it.

And further §5

> Because the object being collected must still be used by the
> finalizer, that object (and other objects accessible only through
> it) must be resurrected by Lua. Usually, this resurrection is
> transient, and the object memory is freed in the next
> garbage-collection cycle. However, if the finalizer stores the object
> in some global place (e.g., a global variable), then the resurrection
> is permanent. Moreover, if the finalizer marks a finalizing object
> for finalization again, its finalizer will be called again in the
> next cycle where the object is unreachable. In any case, the object
> memory is freed only in a GC cycle where the object is unreachable
> and not marked for finalization.

(I wouldn't call that "not very well documented"…)

Rehashed in other (simpler?) words:

If, when you setmetatable(), there's _anything_ non-nil at `__gc` in the
metatable, the thing gets flagged for finalization.  (This is a property
of the table/userdata, not the metatable.)

When the thing is later collected and it has the "to be finalized" bit
set, this bit is cleared and, if _at this point_ the value at `__gc` in
the metatable is a function, that function gets run.

(And no matter what it'll do, the object survives until the next
collection.  Now _usually_, the "to be finalized" bit isn't re-enabled
by the `__gc` method and so the thing will be collected normally by the
next cycle… but you can re-flag it (by again calling setmetatable()
using a metatable with a `__gc` field), and even keep it around
indefinitely in an "undead" state – it's "dead" / fully unreachable from
the rest of the Lua state (hooks don't run during `__gc`), but it can
still do arbitrary stuff with the state.)


A fun / silly use of that is to make the computer beep on every
collection cycle:

setmetatable( {}, { __gc = function(t)
   io.stderr:write("\7") ; setmetatable(t,getmetatable(t)) end }
)

(This is easy to pre-load via the `-e` / `-l` options, and might be
useful for debugging… in fact, the Lua tests do something similar, just
writing a '.' for every collection instead of making it beep.)


You might also (ab)use this to trigger bookkeeping tasks (once per GC
cycle), if you have no better way to do that.  (A fixed "every $n
invocations of a function" scheme might not work (it could fire _both_
too rarely and too often, at different times), and in certain restricted
situations (games etc.), this might be as good as it gets… but note that
this is slightly racy – _any_ allocation can trigger a GC cycle, so
protect your data structures / make sure you're not reading inconsistent
state when triggered in the middle of some change.)


And of course there's LOTS of other stuff that you can do…

-- nobody