My finalizers can be
called in any order and at any time during garbage collection, before
the userdata object is freed.
I'm thinking (hoping) that with these less strict requirements it would
be possible to get rid of the stall in atomic()
I am not sure this will help. I guess that what causes the stall is
not that the handling of each userdata is complex, but that all those
10,000 userdata must be handled atomically. (The key function here
is 'luaC_separateudata'.) I think the solution would be to do that
separation phase incrementally too, but I don't know how to do it.