lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Friday 08, Matthew Wild wrote:
> Ed sent this to me off-list, but suggested that he shouldn't have as
> my response was worth a wider audience.
> 
> In fact I am also interested to know if anyone has experience of the
> issues below, or the various libraries. I did try LuaLanes a couple of
> years back, but only got it to deadlock. Now that it is somewhat more
> maintained now I should look into it again when I get the chance.

I have created a light weight threads module for Lua [1].  Each thread gets 
it's own lua_State instance and there is no locking/shared state.  To 
communicate between running threads you have to use a communications module 
(LuaSocket or the preferred lua-zmq [2]).

My ZeroMQ bindings (lua-zmq [2]) has a wrapper object 'zmq.threads' [3] for 
the llthreads module, which makes it easy to start threads from a block of lua 
code or a lua file.

The ZeroMQ guide [6] has a lot of design patterns to show how to scale.  With 
ZeroMQ you can scale with either multiple threads, multiple servers or both.  
I have ported most of the C examples from the guide to Lua code [7].

The lua-zmq threaded latency bench mark [4]:
message size: 512B
roundtrip count: 1,000,000

lua-5.1.4:
mean latency: 7.270 microseconds

luajit-2:
mean latency: 6.066 microseconds

The lua-zmq threaded throughput bench mark [5]:
message size: 512B
roundtrip count: 10,000,000

lua-5.1.4:
mean throughput: 1,159,872 msg/sec
mean throughput: 4,750.837 Mbits/sec

luajit-2:
mean throughput: 1,820,319 msg/sec
mean throughput: 7,456.105 Mbits/sec


More comments below:

> 
> On 8 April 2011 17:33, Ed <spied@yandex.ru> wrote:
> > On 04/07/11 18:56, Matthew Wild wrote:
> >> Hi,
> >> 
> >> We are very pleased to announce the release of Prosody 0.8.0! Many
> >> thanks to the many people who made this release possible.
> > 
> > sorry for dump question ;)
> > 
> > what multitasking model prosody use?
> > it seems to not use fork or threads.
> 
> It does not currently fork or use threads, no. This is primarily
> because we are not certain of the best way to do it. The "best" in
> this case being the way that gives us the best performance for the
> least code complexity (ideally little to none of the existing code
> would be changed).
> 
> Forking isn't a good option, as unlike in a HTTP server in XMPP all
> the connections may need to frequently exchange data with each other.
> We also couldn't really scale by forking for each client (we should be
> able to handle hundreds of thousands of concurrent connections), so
> there would also be some code complexity in pooling and balancing load
> between a (presumably fixed) number of processes.
> 
> Threading is a more viable alternative, but again there are many
> different ways to do it, some options we have considered:
> 
> * One thread handling connections, and multiple worker threads for
> processing messages from a queue (producer/consumer style):
>  This approach may not bring much performance as it may end up with a
> number of global locks around things like storage and network. Most
> code would be untouched, except for where locks are required.

Don't use shared state between the threads.  Keep the communication between 
threads async. (i.e. use message passing).

> * Thread pool for specific blocking tasks, such as storage (few
> storage truly async storage APIs are available)
>   This is currently our favoured approach. We just scale out the few
> parts we know can be bottlenecks, the majority of work still happens
> in one main thread, meaning no locking or other threading issues can
> appear so easily.

Creating a pool of worker threads with lua-zmq would be easy to do.  Also the 
ZeroMQ sockets can be added to the main event loop to get notices when work is 
finished.  See the 'handler.zmq' [8] object for how to embed a ZeroMQ socket 
into an lib-ev event loop.  It shouldn't be to hard to port that to the 
libevent interface.


> * What I call the "magic" LuaProc/erlang style, where we can just put
> each session in a coroutine, and let the scheduler deal with the
> multi-CPU scaling.
>  This is attractive as it requires very little work on the threading,
> we can just focus on using coroutines and message passing. It should
> also be possible to easily scale down to 1 thread.

It would be interesting to see a distributed version of ConcurrentLua that 
used ZeroMQ for thread-to-thread & host-to-host message passing.

> The main reason we haven't jumped on any one of these is that most of
> them would be a lot of work to implement, and we have no hard data on
> which of them are going to bring us the most benefit without messing
> up our clean single-threaded (and already fast) code. Implementing
> threading in the wrong way could actually hurt performance, because of
> communication between threads slowing the system down.
> 
> And of course another significant reason is that CPU is rarely the
> bottleneck for an XMPP server, it's typically RAM (when you have tens
> of of thousands of clients with various state, you do need a decent
> server RAM-wise). So scaling across CPUs has yet to be a priority for
> us.

Where is all the RAM being used?  Have you profiled the memory usage?  I would 
think that the XML encode/decoding would use a lot of CPU.

> More of a priority is scaling across multiple machines - ie. having
> multiple Prosody instances serving the same domain (for load-balancing
> and reliability). This would also allow multiple Prosodies to run on
> one machine and share the CPUs.

ZeroMQ would allow you to scale from multiple threads to many servers.  I 
would recommend diagramming the flow of messages from client connections to 
different parts of the 'backend' services (i.e. storage, database, message 
routing services).  If you separate those shared components into units that 
communicate with messages, then you can either keep them local or move them to 
different threads or servers.  Also if you use the design patterns from the 
ZeroMQ guide [6] you can scale each backend service from one instance to 
multiple instances as needed, without changing the code.

> > prosody use concurrent multistaking based on select (or something
> > similar)? it is not like copas too - coroutines are rarely used.
> 
> By default Prosody uses non-blocking connections and select(), and if
> you install luaevent then we also have a libevent backend as well
> which allows us to use epoll/kqueue/etc. for greater efficiency.
> 
> We don't use a coroutine per connection like copas does, we use
> callbacks instead. Some rough benchmarks when we first started hinted
> that we could save quite a bit on resources by not using coroutines.

I agree about the resource overhead of coroutines.


1. https://github.com/Neopallium/lua-llthreads
2. https://github.com/Neopallium/lua-zmq
3. https://github.com/Neopallium/lua-zmq/blob/master/src/threads.lua
4. https://github.com/Neopallium/lua-zmq/blob/master/perf/thread_lat.lua
5. https://github.com/Neopallium/lua-zmq/blob/master/perf/thread_thr.lua 
6. http://zguide.zeromq.org/lua:all
7. https://github.com/imatix/zguide/tree/master/examples/Lua
8. https://github.com/Neopallium/lua-handlers/blob/master/handler/zmq.lua

-- 
Robert G. Jakabosky