Resumable Vm Patch

lua-users home

This is a patch to make the Lua VM fully resumable. It solves most of the practical problems with coroutines still present in [Lua 5.1].

In particular, you can now:

This is a new and fully portable approach to this problem. It combines ideas from Eric Jacobs 'improved coroutines' patch and experiences gained from working on my 'true C coroutines' patch. The changes to the Lua core are as little intrusive as possible, owing to a flexible yield mechanism. One side-effect is a faster pcall() and a slightly faster Lua VM in general.

License notice: I hereby declare that all work I contributed to the Lua core is to be governed under the same license as the Lua core itself. -- MikePall

Greg Falcon aka VeLoSo has offered to be the new maintainer of the patch. Please direct all questions about the patch to him.

I've decided to continue my work on a different approach to the yield-across-C problem. Please have a look at [Coco], too. Both approaches have their benefits and drawbacks. For more info search the [Lua mailing list archive] for "Coco" and/or "RVM". -- MikePall

I second this. [Coco] has a good many benefits over this patch. It is a much more elegant approach, it is better tested, it requires no modifications to your Lua C functions to make them resumable, it works on a great many platforms, and it is required for [LuaJIT 1.x]. Coco is probably the correct choice for your project if you need improved coroutine support.

That said, I think this patch is valuable because of its ANSI C approach, which allows it to work on all platforms where Lua builds, and which might help lead to a future Lua core with better coroutine behavior. -- VeLoSo

LuaFiveTwo implements an approach like this [1].


Click to download the patch [versus 5.1 final].

Update 2006-04-16 VeLoSo: Patch against Lua 5.1 final. No major changes. I did remove some performance patches that did not further the goal of a resumable VM (for clarity's sake). I updated this patch for 5.1 with a good amount of care, though it's possible I may have missed some subtle point. Buyer beware.

Update 2005-05-24 MikePall: Patch against Lua 5.1-work6. No major functional changes, but some internal reorganization. It's now possible get a traceback from a dead coroutine (like in the baseline). The new operators in work6 (*s and a%b) are of course resumable, too.

A number of fixes for the Lua 5.1-work6 baseline have been included in the patch: MSVC number2int fix, *s performance improvement, remove undefined lua_checkpc assertion.


The Lua API has not been changed and there is no need to change anything in existing Lua code. In particular you can now put code that uses pcall() inside a coroutine and still yield from the protected functions without any code change.

There is no need to change existing C code, too. But of course existing functions are not fully resumable because the old C API did not provide such a capability. This is only a concern if your C functions need callbacks (lua_call) or protected calls (lua_pcall) and you want to yield from the called functions.

There is a handful of new C API functions that allow you to make your C functions resumable. The conversion is pretty straightforward. E.g. I had to change only a couple of lines each to make the standard Lua library functions resumable. See below for a tutorial.

How it Works

The changes to the Lua core are non-intrusive due to a special trick in the new yield mechanism: Whenever there are intervening (but resumable) C call boundaries, the yield mechanism just throws a 'yield error' to quickly unfold the C stack. There is no need to check for a yield condition on return of each and every function call all the way through the bottom of the C stack.

This helps minimize the changes to the VM core and makes it easy to convert over existing C functions. There is no performance impact on the standard control flow (i.e. without yielding).

BTW: The yield throw mechanism is not used in the standard situation of a coroutine.yield() to avoid the overhead of longjmp.

The protected call mechanism has been changed to avoid using a setjmp wrapper whenever there is already such a wrapper in the C stack. Unwinding the C stack and the Lua stack are now two separate concerns. This makes pcall() as cheap as a function call: ~10-15% faster on x86 and even more on CPUs with many registers, e.g. ~30% on IA64 (Itanium).

I've done quite a bit of instruction profiling and cache-miss profiling and derived several tiny optimizations to the Lua core. E.g. coroutines need ~10% less memory.

I fixed a few bugs/misfeatures, too:

Note that the patch looks a bit lengthy due to many one-line changes and some code that had to be moved around in ldo.c. Really, there isn't that much new code except for the new stack unwind mechanism and the VM opcode resume handling.

Lua API Extensions

There is only one new function:

  epcall(err, f, arg1, arg2, ...)
It is intended as a saner replacement for the odd xpcall() API that does not allow you to pass arguments to a protected function when you want to set an error handler, too. This implied that you needed to artificially create closures for use with xpcall(). epcall() has no such restrictions.

epcall() behaves in every respect like pcall(), except that it sets an error handler function to be called before the Lua stack is unwound (like xpcall() does).

xpcall() is retained for compatibility only.

Another minor change is that you get a different error message when a yield fails:

This better reflects the circumstances when this error happens now (if at all).

C API Extensions

A new set of lua_v* and lua_i* functions augment lua_call(), lua_pcall() and lua_yield():

  void   lua_call (lua_State *L, int nargs, int nresults);
| void  lua_vcall (lua_State *L, int nargs, int nresults, void *ctx);
| void  lua_icall (lua_State *L, int nargs, int nresults, int ictx);
  void  lua_pcall (lua_State *L, int nargs, int nresults, int ef);
| void lua_vpcall (lua_State *L, int nargs, int nresults, int ef, void *ctx);
| void lua_ipcall (lua_State *L, int nargs, int nresults, int ef, int ictx);
  int   lua_yield (lua_State *L, int nargs);
| int  lua_vyield (lua_State *L, int nargs, void *ctx);
| int  lua_iyield (lua_State *L, int nargs, int ictx);
(Don't worry, most of these are just macros with proper casts.)

The new functions take a context argument (either a void * or an int) which can be used to save the current state of the running C function. Using these API functions with a non-NULL/non-zero context argument indicates to the Lua core that your C function is resumable (i.e. the callbacks are allowed to yield).

The classic non-resumable API functions lua_call(), lua_pcall() and lua_yield() are just macros, passing a NULL context argument to the lua_v* equivalents.

Two new API functions can be used to retrieve the context:

| void *lua_vcontext (lua_State *L);
| int   lua_icontext (lua_State *L);
On the first call of a C function the context is initialized to NULL/zero. When a coroutine yields and is resumed again, a resumable C function is simply called again. But this time the context is guaranteed to be non-NULL/non-zero and reflects the saved state of the C function (or the error number in case an error is caught by lua_vpcall/lua_ipcall).

You have to be aware that the C stack including your C function may be unwound when you use the new API functions for callbacks. This can only happen when the callback (or any function called from it) yields. In this case the control flow never returns to your C function from the new API call. Instead your C function will be called again when the coroutine is resumed.

This means that you have to save all context your C function keeps. Either by passing a specific context argument to the API call (a flag, an index/counter or a pointer) and/or by saving the context on the Lua stack (before making the API call). See below for a tutorial.

When a coroutine is resumed, any functions higher in the call stack are completed before resuming (returning to) functions lower in the stack. There are no restrictions on the use of the Lua value stack before or after using any of the above API functions because there cannot be an active function higher in the stack while a C function is executing (unlike when it is suspended).

All lua_*yield() API functions are tail calls in the sense that you must use them with a return statement, like this:

  return lua_vyield(L, na, ctx);
The call may or may not return to your C function before executing the return statement, depending on whether the standard yield or the yield throw mechanism is used. Do not attempt to outsmart this mechanism by adding code between the yield call and the return statement. When the context is NULL/zero (or lua_yield is used), your function will not be called again (a tail yield). Otherwise it will be called again when the coroutine is resumed.

lua_vpcall/lua_ipcall may fall back to the classic behaviour of creating a setjmp wrapper on the C stack. This happens when no such wrapper exists yet or when there is an intervening non-resumable call boundary in the C stack (e.g. when used from hooks or from __gc). Of course the resulting call stack is not resumable, but then it wasn't resumable before that, anyway. You will hardly notice in practice because the standalone Lua executable (lua.c) always wraps the main chunk with lua_pcall() and of course lua_resume() creates a setjmp wrapper, too.

Another point to note about lua_vpcall/lua_ipcall (but not lua_pcall) is that they may leave the callback function and its arguments on the stack below the error message, in case of error (sorry, but this is very difficult to solve in the Lua core). You have to be careful not to make any assumptions about relative stack levels when the protected call fails. Of course the error message is guaranteed to be in the topmost stack slot (relative index -1) and anything below the callback function (absolute index 1..(func-1)) is still intact, too.

I removed lua_cpcall() from the core and replaced it with a simple macro using standard API calls. I suggest that it be deprecated because it's redundant.

Tutorial: Keeping the Context

Here is a short tutorial showing the changes you need to make to your C functions to make them resumable (changed/new marked with **/++).

EXAMPLE #1: Simple flag -- table.foreach():

    static int foreach (lua_State *L) {
++    if (lua_vcontext(L)) goto resume;
      luaL_checktype(L, 1, LUA_TTABLE);
      luaL_checktype(L, 2, LUA_TFUNCTION);
      lua_pushnil(L);  /* first key */
      for (;;) {
        if (lua_next(L, 1) == 0)
          return 0;
        lua_pushvalue(L, 2);  /* function */
        lua_pushvalue(L, -3);  /* key */
        lua_pushvalue(L, -3);  /* value */
**      lua_icall(L, 2, 1, 1);
++  resume:
        if (!lua_isnil(L, -1))
          return 1;
        lua_pop(L, 2);  /* remove value and result */
Here everything needed to resume is already on the Lua stack (the previous key). So replacing lua_call() with lua_icall() and setting a simple flag (1) is all you need to do.

Also note that the goto can safely jump around the initial checks because the stack contents are guaranteed to stay the same when your C function is resumed. This omits redundant checks (e.g. the standard mechanism to check userdata metatables is slow).

If you really, really hate gotos then of course you can use if/switch constructs, too. But you have to realize that they obscure the 'standard' control flow. This is one of the cases where it makes perfect sense to use goto. And your compiler will print a big fat warning when you jump into a loop and forget to fetch the loop counter from the context, too.

EXAMPLE #2: Loop counter -- print():

    static int luaB_print (lua_State *L) {
      int n = lua_gettop(L);  /* number of arguments */
**    int i = lua_icontext(L);
++    if (i) {
++      n -= 2;  /* compensate for tostring function and result */
++      goto resume;
++    }
      lua_getglobal(L, "tostring");
      for (i=1; i<=n; i++) {
        const char *s;
        lua_pushvalue(L, -1);  /* function to be called */
        lua_pushvalue(L, i);   /* value to print */
**      lua_icall(L, 1, 1, i);
++  resume:
        s = lua_tostring(L, -1);  /* get result */
        if (s == NULL)
          return luaL_error(L, "`tostring' must return a string to `print'");
        if (i>1) fputs("\t", stdout);
        fputs(s, stdout);
        lua_pop(L, 1);  /* pop result */
      fputs("\n", stdout);
      return 0;
Here we simply store the loop counter in the context. You need to be careful to use only non-zero indexes because a zero context argument marks a non-resumable call (and could not be distinguished from the initial call, anyway).

The stack level may change during the execution of the function and this can lead to a number of problems on resumption. Either keep it at a fixed level (you can ensure this with lua_settop()) or use only relative indexes

or compensate for it after retrieving the context (see above).

EXAMPLE #3: Protected calls -- pcall():

    static int luaB_pcall (lua_State *L) {
++    int status = lua_icontext(L);
++    if (status) goto resume;
      luaL_checkany(L, 1);
**    status = lua_ipcall(L, lua_gettop(L) - 1, LUA_MULTRET, 0, -1);
++  resume:
~~    if (status > 0) {  /* error */
~~      lua_pushboolean(L, 0);
~~      lua_insert(L, -2);  /* args may be left on stack with vpcall/ipcall */
~~      return 2;  /* return status + error */
~~    }
~~    else {  /* ok */
~~      lua_pushboolean(L, 1);
~~      lua_insert(L, 1);
~~      return lua_gettop(L);  /* return status + all results */
~~    }
~~  }
There are four cases that can happen with lua_vpcall/lua_ipcall:

Unless you need the context argument for your own purposes (avoid the range for the error numbers though), you can simply set the context to -1 and assign it to the status on resume. This allows for a simple check for > 0 (error) or <= 0 (ok). You cannot use zero because this marks a non-resumable pcall.

The above code works around the fact that lua_vpcall/lua_ipcall may leave the call function and its arguments on the stack (but only when an error is thrown during call setup). It's easier to explicitly code the two possible outcomes than using inline conditionals (as before).

EXAMPLE #4: Userdata pointer:

    typedef struct { ... } mytype_t;

    static int my_userdata_method (lua_State *L) {
      mytype_t *ud = (mytype_t *)lua_vcontext(L);
++    if (ud) goto resume;
      ud = (mytype_t *)luaL_checkudata(L, 1, MYTYPE_HANDLE);
      if (ud == NULL) ... /* error handling */
      ... /* check other args here */

      ... /* start processing */
      ... /* be sure to save all context in the userdata structure */
**    lua_vcall(L, na, nr, ud);  /* pass userdata as context */
++  resume:
      ... /* continue processing */
This is a typical example of a userdata method. Most of the time you can store all needed context in the userdata and just use the userdata pointer itself as a context argument. Of course the userdata is still on the stack (it must be or it may be garbage collected), so you could just retrieve the pointer again. But that would be slower and the context argument is already there, so why not make good use of it?

Userdata methods are a perfect example for lua_vyield(), too:

      while (read_operation(ud, ...) == BLOCKING) {
        ... /* do any processing needed after a blocking indication */
**      return lua_vyield(L, na, ud);
++  resume:
        ... /* do any processing needed resuming the read */
Here a potentially blocking I/O operation is attempted. Instead of blocking, the operation returns a special status flag (BLOCKING). This allows you to save the context and yield (e.g. back to a coroutine scheduler), avoiding to block on the I/O operation. When the function is resumed, you just retry, continue or finish the I/O operation. This needs to be done in a loop if the I/O operation may block repeatedly.

EXAMPLE #5a: State machine with switch:

    static int myfunction (lua_State *L) {
++    int state = lua_icontext(L);
++    switch (state) {
++    case 0:  /* initial call */
**      lua_vcall(L, na, nr, 1);
++    case 1:
**      lua_vcall(L, na, nr, 2);
++    case 2:
**      lua_vcall(L, na, nr, 3);
++    case 3:
        return n;
Sometimes you need to use a bunch of callbacks or yields interspersed in linear control flow. Here it may be beneficial to use a state machine and save the current state (the position in the control flow) in the context. You can assign the states by hand for simple needs (as shown above, but probably with defines instead of numbers). For more complex needs you could use macros or even a precompiler (written in Lua, of course?) to generate the state machine automatically from your code.

EXAMPLE #5b: State machine with goto:

If you are using GCC you will be delighted to hear that it has a very useful (but non-standard) extension called labels as values aka computed goto. Read more about in the GCC info docs.

This is best combined with the local label extension inside a macro. Here is the above example converted to use this feature (cleaner and quite a bit faster, too):

++  #define rvcall(L, na, nr) \
++    ({ __label__ RR; lua_vcall(L, (na), (nr), &&RR); RR: ; })

    static int myfunction (lua_State *L) {
++    void *cont = lua_vcontext(L);
++    if (cont) goto *cont;
**    rvcall(L, na, nr);
**    rvcall(L, na, nr);
**    rvcall(L, na, nr);
      return n;

To Yield or Not To Yield

Here is table showing you exactly where your code can yield and where it cannot.

Note that most of the remaining "no's" are irrelevant in practice.

Yield across ...  | Yield from ...         | Ok? | Rationale
for x in func     | iterator function      | Yes |
VM operations     | metamethod except __gc | Yes |
(anywhere)        | __gc metamethod        | No  | Difficult, rarely useful
(anywhere)        | count/line hook        | Yes | Only via C API hooks
(anywhere)        | call/return hooks      | No  | Does anyone need this?
error processing  | err. handler/traceback | No  | Not very useful
pcall()           | protected callback     | Yes |
xpcall()          | protected callback     | Yes | [Deprecated]
epcall()      NEW | protected callback     | Yes | Sane API, replaces xpcall
print()           | tostring() callback    | Yes |
tostring()        | __tostring metamethod  | Yes |
dofile()          | chunk                  | Yes | It was simple :-)
table.foreach()   | callback               | Yes |
table.foreachi()  | callback               | Yes |
string.gsub()     | callback               | No  | Tricky, I have no use case
table.sort()      | callback, __lt metam.  | No  | Not very useful
require()         | chunk                  | No  | Not very useful
debug.sethook()   | Lua hook function      | No  | Please use a C hook
load()            | chunk reader           | No  | Parser is not resumable
lua_call()        | callback               | No  | For compatibility (macro)
lua_vcall()   NEW | callback               | Yes |
lua_icall()   NEW | callback               | Yes | (macro)
lua_pcall()       | protected callback     | No  | For compatibility (macro)
lua_cpcall()      | protected callback     | No  | For compatibility (macro)
lua_vpcall()  NEW | protected callback     | Yes |
lua_ipcall()  NEW | protected callback     | Yes | (macro)
lua_load()        | chunk reader           | No  | Parser is not resumable
lua_equal()       | __eq metamethod        | No  | (*)
lua_lessthan()    | __lt metamethod        | No  | (*)
lua_gettable()    | __index metamethod     | No  | (*)
lua_getfield()    | __index metamethod     | No  | (*)
lua_settable()    | __newindex metamethod  | No  | (*)
lua_setfield()    | __newindex metamethod  | No  | (*)
lua_concat()      | __concat metamethod    | No  | (*)
(*) There is no need. Just call the metamethod with lua_vcall/lua_icall. Or perform these kinds of operations with Lua code (always resumable).


RecentChanges · preferences
edit · history
Last edited July 3, 2010 11:32 pm GMT (diff)