lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Am 30.12.2013 03:38 schröbte Rena:

The design is:
-A Thread class keeps track of the child threads that we created (storing
things like the thread's current state, the pthread_t, the Lua state
running in that thread, etc) (note that I'm using "thread" to refer to the
OS thread and "Thread" to refer to the class instance keeping track of it)
-Each child thread runs its own Lua state
-The child threads can themselves create child threads
-When a thread is created, we malloc() a Thread object, and push a full
userdata which is just a pointer to that Thread (since light userdata can't
have metatables)
-The main thread has a Thread as well, to keep track of its children
-Each Thread has a linked-list of child threads it created
-When a thread userdata's __gc fires, we free the userdata and set a flag
in the Thread that tells that it's been collected
-When the child thread is finished running and sees that it's been
collected, it does some cleanup (closes its Lua state, etc) and exits
-When a thread exits, it iterates its child list, telling each thread to
shut down once it's finished, and waiting until they've all exited (using
pthread_join), only shutting down after all of its children have shut down.

It seems simple enough: when __gc fires, we could kill the thread and
delete it from the child list right there. The trick is, I don't want to do
that, because:

1) __gc doesn't fire immediately after the reference is dropped, but some
time later when the collector runs. This would lead to seemingly random
deaths of child threads.

2) A pthread can't really be killed. There's only pthread_cancel which
*requests* that the thread shut down, and it will do so at the next
cancellation point, which might not come for a long time if it's executing
some long-running script. This would also cause the parent thread to be
delayed until the child finishes.

You *could* kill a pthread (using `pthread_setcanceltype`), but you shouldn't, because asynchronous cancellation is flawed by design and can lead to deadlocks and/or broken invariants. `pthread_cancel` does not wait for the actual cancellation, so the parent thread doesn't have to wait in either case (`pthread_join` *will* wait for the thread to die however) ...


3) I wanted the parent thread to be able to use a "fire-and-forget" design:
create a thread and let it go off to work on something without having to
keep track of it.

That's what `pthread_detach` is for, but I guess you want to wait for all running threads before shutting down the main thread, right?


So instead, we let the child thread continue to run after __gc and shut
down once it's finished.

Here's the problem: If the Thread struct in the child list doesn't get
deleted when __gc fires, then when *does* it get deleted? I'm envisioning a
scenario such as a server, where the parent thread might be running for a
very long time, creating many thousands of threads which run some brief
task (such as serving a request) and shut down.

If I have the child thread delete itself at shutdown, it raises concurrency
issues. If the child thread happens to finish execution and shut down just
as its __gc fires, then the __gc will try to set the "collected" flag on a
deleted thread -> segfault. I can't have __gc use a mutex to wait for the
thread to shut down either, since it might not shut down for a long time
yet, and I don't want the parent thread to block waiting for it. For
example in a server this means a long-running request could effectively
block all other requests until it finished.

When the parent thread shuts down, it will loop through its list of
children and pthread_join() all of them waiting for them to shut down. At
this point it'd be safe to delete them. Trouble is, if the parent thread
runs for a very long time and creates thousands of threads, then this
basically a memory leak - all those threads which have shut down ages ago
will still have a Thread object in the child list until the server is shut
down.


I'd use the membership in the parent's list as an indicator whether a thread should detach itself before exiting, and a reference count for determining if a child thread can free its Thread structure itself or has to let a `__gc` metamethod handle that. You will still need to protect the parent's list and the reference count with mutexes, but only for the time it needs to read/modify them.


Philipp