lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi,

Andre Carregal wrote:
> The recent (hot) discussion about threads x coroutines x sockets was
> certainly interesting but sometimes a bit cryptic for me.

[BTW: I saw your private mail, but did not come around to answer yet,
      because I wanted to provide a summary explaining all issues.]

> During it duck made a question that I'd also like to know more about. What
> would be the problems involved with a coroutine based server dispatcher?
> Xavante has been on hold for quite a while simply because I haven't figured
> out that one.

This shouldn't hold you back. You are designing for specific requirements.
If the approach works for you, then just use it.

> We were recently discussing this draft and it evolved to the attached
> prototype of coroutine based pcalls and xpcalls for Lua 5.0. They have been
> succesfull used on the current single-connection-at-a-time Xavante version
> by simply making pcall = copcall and xpcall = coxpcall.
> [...]
> I'd like to find out how much this approach would be useful/stable. Since
> Xavante is targeted for embedded devices, performance is not a big issue but
> stability is.

The approach itself is perfectly valid. It just does not work well when
you have many thousand connections at a time. So this is all about
performance issues and maybe a bit about usability and orthogonality.

If you don't need to worry about performance and/or don't need to integrate
C code, then go for it. Otherwise read on for the details below.

[ But as an aside -- as far as embedded devices are concerned, there are
  two separate use cases:

1. Using an embedded web-server for remote configuration (standard on
   switches, routers and other gadgets nowadays). There you'll be lucky
   if you see more than a handful of connections at a time. There is no
   need to worry about performance at all (but beware of memory leaks).

2. If the primary task of the device is to serve as a web-server, there
   is no inherent rule that it should not serve thousands of connections
   at a time. Modern embedded servers have enough resources (CPU and RAM)
   to do that. But since they are more constrained than some big iron server
   in a datacenter, you really have to watch for performance issues.
]


Coroutine scheduling, yield and the C stack
-------------------------------------------

The primary difficulty with coroutine-based connection multiplexing is
the fact that you cannot block anywhere. You must be able to yield back
to the coroutine scheduler at any point in your control flow where you
do some I/O that might block (of course using non-blocking I/O calls,
so you can detect that condition).

However with the current state of affairs the Lua API does not allow you
to yield across the C call stack boundary. This has several practical
implications:


1. You cannot yield from a function protected by pcall.

The current implementation of pcall adds a C stack frame to store the
setjmp() jmp_buf and a few more frames to call back into the core VM loop.

You can either stop using pcall or use the above mentioned coroutine-based
replacement.

Not using pcall isn't as bad as it sounds. For simple approaches you can
just rely on the error trapping ability of lua_resume(). But this means
that your whole coroutine (i.e. connection) is dead when any error occurs.
It cannot be resurrected and all implicit context (local variables and
where you left off) is lost.

In simple cases just kill the coroutine and send an error message (if you
manage to get hold of the connection handle). Complex cases can be solved
with a stack of finalizers that needs to be explicitly managed. But none
of this is simple -- just being able to use pcall would be a lot better.

The alternative with copcall has a few drawbacks in performance sensitive
environments:
- It requires you to use an extra coroutine for each pcall level.
  Note that there may be more than a single level. And the common case
  is that you need to yield from the deepest level, so these coroutines
  tend to stick around.
- These extra coroutines take up memory (>800 bytes each on x86 right now).
  This is definitly an issue when you have many connections.
- Unless you are recycling these extra coroutines you will loose a lot
  of performance just by creating them every time. And the GC will be
  very busy cleaning up after you.
- You loose quite a bit of performance, since you have to do all of this
  in Lua. You cannot rewrite copcall completely in C, because you cannot
  yield from it to a lower level without breaking out of the control flow
  (see below).


2. You cannot yield across lua_call().

This is because it needs to setup a few C stack frames to call back into
the core VM loop.

The use case is with callbacks from C code. Say you have an XML stream
parser that calls a bunch of user-defined Lua functions for every opening
tag, closing tag and content element. If one of these Lua functions needs
to yield (e.g. when writing something to a network socket), you are out
of luck.

In this example you may be able to modify the parser to use an iterative
approach (call into the parser and get one element at a time as a return
value). But this is impractical in other cases. And it does not solve
the problem when the parser is reading from a stream that may block
(see below).


3. You cannot continue in your C control flow after lua_yield().

[This may be a bit difficult to understand unless you know the Lua internals.]

The reason is that lua_yield() is more or less just a 'return -1'. Your
C code is supposed to pass this return value back to the Lua VM. I.e. you
can only use it like 'return lua_yield(L, nresults)'.

Normally your C function returns the number of return values it has put
on the Lua stack. But when the Lua VM sees the special value '-1', it
decides that you want to yield instead. It then manually unwinds the C stack
until it gets down to lua_resume() and returns from it.

But now the problem: when the scheduler later resumes the coroutine with
lua_resume() it does not resume the C routine that you yielded from.
It continues with the surrounding Lua code.

This basically means that your C function has been dropped from the
call stack and there is no way to get it back in control unless you
wrap some Lua (!) code around it.

[Note that the fact that lua_yield() and lua_resume() pass back and forth
 some objects is really a side issue here. All a coroutine-based scheduler
 really needs is to get back control.]

A simple example is a wrapper around the socket receive call from LuaSocket:

function wrap_receive(sk, pattern)
  local s, err, part
  repeat
    s, err, part = sk:receive(pattern, part)
    if s or err ~= "timeout" then return s, err end
    -- note: timeout is zero, so this is really a "blocking" indication
    coroutine.yield(whatever_you_need_to_send_to_your_scheduler)
  until false
end

You'll need to provide a wrapper for _every_ single C routine that may
indicate a blocking condition. And you need to be extra careful to pass
back and forth between C and Lua all partial results, state ids and so on.
And be sure to tell your coroutine scheduler exactly which event(s) you'd
like to receive to wake up your coroutine. All of this is well known by
the C code, but you need to match this with Lua code. As you can imagine
this quickly gets annoying, error-prone _and_ slow.

[Note that Diego changed the LuaSocket API to make this a lot easier
 in this specific case. There is a very subtle interaction with the
 "part" variable and the internal LuaSocket connection context going on.]


4. You cannot build up a stack of C calls and then yield.

There are more complex cases where you'd like to yield a few call levels
down into your C code _and_ resume wherever you left off. There is a
standard approach, called 'stack ripping' that allows you to do this.
It's ugly, but basically workable _as long as you stay on the C side_.

But you still need to have a Lua wrapper that _somehow_ has to preserve
all of the context. This is just not going to work well in the general
case (think about a stack of protocol layers) and bad for encapsulation, too.

And please don't suggest allocating an object on the heap every time you
need to yield ... because you'll need to yield very often (for every
line or packet in most network protocols).


There are many ways to solve this dilemma. Here is a quick checklist
to show how well the various approaches that came up on the list work:

                         | 1. pcall | 2. lua_call | 3. C flow | 4. C stack
-------------------------+----------+-------------+-----------+-----------
A. Out of the box        |    No    |      No     |     No    |     No
B. copcall               |   Yes    |      No     |     No    |     No
C. True C coro. [Mike]   |   Yes    |     Yes     |    Yes    |    Yes
D. Improved coro. [Eric] | Yes (NYI)|     Yes     |   (Yes)   |     No


As I have stated elsewhere my approach with true C coroutines (inspired
by Edgar Toernig) is not workable when you aim for portability. And it
has some other problems in practice. I've stopped working on it.

Read the original list thread for more info:
http://lua-users.org/lists/lua-l/2004-10/msg00217.html


Eric's approach (which can be described as a variant of stack ripping
inside the Lua core) is portable, but needs a bit more effort on the
C side. You need to manage the continuation points and the stack ripping
yourself (I consider this a fair compromise). If you are wondering about
pcall: this was missing from the preliminary patch, but is certainly doable.

Read more about it at: http://lua-users.org/wiki/ImprovedCoroutinesPatch


As Roberto has pointed out, the Lua authors are aware of the problem and
would like to see a (portable) solution. This could be done along the lines
of Eric's patch, but the impact on the Lua core needs to be minimized.
This may be hard to do, but it's worth it.

I'm still trying to come up with a solution and already experimented with
a few variants. But I have to put more time and thinking in it. Don't hold
your breath.

Maybe someone else has taken a closer look at this problem and comes up
with a good solution.

Bye,
     Mike