You could set up a hook for each thread and use longjmps in each
handler to jump back and forth across threads. This however will be
very inefficient if you don't use large instruction counts I think.
You would need to set up your own collaboration technique. Also this
won't protect you much from time consuming C calls (if you want to
avoid this).
Time-consuming C calls won't be a problem.
I was able to hack luaV_execute to execute a single instruction and set the internal state to LUA_YIELD instead of just running the entire for (;;). I was then able to call lua_resume to step a single instruction. Seems to work so far for simple programs, but I guess I'll see what problems crop up.
--Mike