lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On 9-Jun-07, at 10:20 PM, Thomas Harning Jr. wrote:

I decided to move to using callbacks and having Lua code do coroutine
management... more flexible anyways.  It also helps manage the
lifetime of the coroutine, since I need to keep it in something to
prevent its collection.  I ran into some interesting issues where Lua
had freed the coroutine and it still ran, but segfaulted later on.
Valgrind was a great tool there...

Now that I have this callback system in place, I'm running into new
issues... it seems that the instruction pointer is getting messed up
and/or strange stack curruption issues.

I've attached a tarball of the code. Note that I'm using libevent-1.3b...
Hopefully the problems can be fixed easily... perhaps the library
could also get 'shrunk'... though for what it does, it's pretty good.

With the ldb-patched code + API assertions:
TRACE >: 	addserver
TRACE >: 	create
TRACE >: 	coro
TRACE >: 	setupHook
TRACE >: 	sethook
TRACE >: 	resume
TRACE 	thread: 0x52e280	>: 	nil
TRACE 	thread: 0x52e280	>: 	getWrapper
TRACE 	thread: 0x52e280	>: 	running
TRACE 	thread: 0x52e280	>: 	addevent
TRACE 	thread: 0x52e280	>: 	nil
TRACE 	thread: 0x52e280	>: 	yield
TRACE >: 	assert
TRACE >: 	loop
TRACE 	thread: 0x52e280	>: 	nil
TRACE 	thread: 0x52e280	>: 	resume
TRACE 	thread: 0x52e280	>: 	select
lua: lapi.c:502: lua_pushboolean: Assertion `(L->top <= L->ci->top) ||
luaAC_report(L, "Stack overflow -- see lua_checkstack")' failed.


Sadly, I think this API error message is misleading (I did say the patch
was experimental, right? :) ) although it definitely indicates an API
calling error. In this case, I believe that the error is that you're
effectively calling resume on the running coroutine, which is a no-no.

addevent is called from the coroutine, so the L that it stores in
the event structure is the coroutine, not the main state. That L
is retrieved by event_loop, which pushes a wrapper function onto
the stack and calls it; the wrapper function then tries to resume
the currently running thread. I think. I might try compiling
this on some machine and seeing if I can get ldb to provide more
useful information; it seems like an interesting test case for
a debugger.

Anyway, the TRACE information does seem to indicate that resume
is being called *from* the thread, which is presumably the only
thread which exists, and thus must be the thread being resumed.
That will probably end up scrambling the stack, and that
seems to have happened.

What you actually want, I guess, is for the libevent callback
to resume the coroutine which owns the socket (that works for
server and client sockets, but not for proxy sockets -- that's
a different issue, though.) That means that you need to create
the socket and libevent object in one go; the socket is
attached to a coroutine; the master lua_State and the coroutine's
reference are stored in the libevent callback userdata, and
then the coroutine is started and runs until it yields, at
which point the event is activated. At least, that sounds
to me like it would work. Hope it helps a bit.

If you figure out which API error was not being checked by
my patch, let me know so that I can add it to the tests.
I'm going to take a look at the "resume resumes itself"
error tomorrow and maybe update the patch if I can figure
out an easy solution.

R.