Re: Interesting Lua Thread caching results

lua-l archive
[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]
Subject: Re: Interesting Lua Thread caching results
From: Sean Conner <sean@...>
Date: Sat, 27 Sep 2014 03:17:49 -0400
  This is a long response, but I do go into deep detail about a system I'm
currently developing at work.  Unfortunately, there's quite a bit of detail
I need to go into before I get to the actual results, but the executive
summary will be: don't worry about the memory usage with coroutines.

It was thus said that the Great Andrew Starks once stated:
> I'm trying out a concurrency approach.[1]
> 
> I was wondering about threads and garbage. That is, if I keep calling
> 'create(fun)', will it keep making new threads, even when prior threads
> died. I assumed 'yes' and then wrote a library to create a thread pool.
> 
> I had wondered if there was an indirection in Lua's implementation that
> made this unnecessary. That is, maybe Lua was holding onto a thread that
> was dead and returning that to use with the new function, thus reducing the
> amount of potential garbage, at the expense of the book keeping.

  By "thread" I assume you mean a Lua "coroutine".  I know the Lua manual
uses both terms interchangeably, but for here, I'm going to limit myself to
"coroutine".  Also, I'm defining "main" as the main coroutine that is
created when you create a new Lua state.

  At work, I'm creating a network service [2] that creates a coroutine per
request.  About six months ago I did a bunch of load tests as I was
concerned about both performance and memory usage.  First, a bit about how
the program I wrote works.

  At it's core, it's an event-based server [3] (not based on select() [3]
but I'll probably use the term 'select' because that's still a popular
interface for handling network events), but creates coroutines to handle the
requests in a (in my opinion) sane, straightforward, imperative manner.  The
coroutines, once started, run until they will block (making a network
request, for example, a DNS query) or they're done.  There is no
pre-emption.

  The framework has three "queues" for coroutines, one to store a reference
to all the currently active coroutines (so they aren't GCed out from under
us), one to store coroutines that have requested a timeout (it's actually a
priority heap implemented as an array, where the "priority" is when the
coroutine should be resumed) and one more that is a list of all coroutines
that are ready to run.

  Generally, the main loop is something like:
  
  	get the current time.

	check the timeout queue for any coroutines that have "timed out".
  	for each coroutine that has "timed out", remove it from the timeout
  	queue, and append it to the run queue.

	calculate the timeout for the call to select(). Basically:  if the
	timeout queue and run queue are empty, set an indefinite timeout; if
	there's anything in the run queue, set the select timeout to 0;
	otherwise, calculate the timeout based on when the next coroutine
	will timeout.

	call select(timeout) to get network events.  Call the handler for
	each network event (more on this below)
	
	if the run queue isn't empty, resume each coroutine in the list.  If
	the status of the coroutine is 'dead', remove the reference from the
	master reference queue.
	
  As stated before, all coroutines will run until they block (and will be
placed in the timeout queue) or they finish.  The handlers basically
retreive the packet, do some checking and either create a coroutine (I have
a routine to do this) or resume a coroutine (each handler can have its own
IO wating queue of coroutines).  For example, the bit of code to handle a
DNS query:

	function query_server(server,query,rr)
	  local id = server:next_id()
	  local e  = dns.encode{
	                id       = id,
	                query    = true,
	                rd       = true,
	                opcode   = 'query',
	                question = {
	                        name  = query,
	                        type  = rr,
	                        class = 'IN',
	                }
	        }
	
	  server:set_record(id)			-- add this coroutine to the DNS IO wait queue
	  server.socket:send(server.address,e)	-- send the query
	  timeout(server.timeout)		-- to guard against slow or dropped packets
	  local info,err = coroutine.yield() 	-- yield
	  timeout(0)				-- remove a pending timeout (always safe to call timeout(0))
	  server:remove_record(id) 		-- remove us from the DNS IO wait queue
	
	  return info,err
	end

  The handler for the DNS socket will recieve the response, look up the
appropriate coroutine from the DNS IO wait queue, and add it to the run
queue.

  The handler for the main service will check to see if it's an initial
packet, and if so, create a coroutine; otherwise, it will determine the
proper coroutine to resume.

  I can also create free-running coroutines that "run" independently. 
Here's the one I use in the code (spawn() creates the coroutine and
schedules it on the run queue; stat.gauge() collects stats [4]):

	spawn(function()
	  while true do
	    stat.gauge('foo.gc',collectgarbage('count') * 1024)
	    stat.gauge('foo.co',WQUEUECNT)
	    timeout(1) 
	    coroutine.yield()
	  end
	end)

  Every second (at most---keep reading) record the memory usage of the
system, and the number of coroutines.  The stats are dumped at a fixed rate
(every minute in the current setup) and consist of the minimum value,
average value and maximum value seen over that time period.  

  I say the sample is every second, but in reality, because of the way I
handle the coroutines, it may take several seconds to get around checking
the timeout queue, and several more seconds for it to actually run; if this
becomes an issue, I can probably rework the main loop to either run one
coroutine from the run queue per loop, or a fixed number of coroutines per
loop instead of *all* the coroutines as I do now.  

  But that's an optimization for a later time.  This coroutine does run
enough to collect stats, and the results were rather amusing.  I wish I had
graphs, but I don't---the bits used to display them have long been recycled. 
With that said ... 

  At the time, I was concerned about the memory usage, which is why I added
the above coroutine, and the results from some early runs was rather
discouraging.  BUT the load testing was only run for a few minutes at a
time.  I decided to let the load testing run over a weekend and graph the
results the following Monday.

  At first, the memory was all over the place, very jagged.  But then an odd
thing happened a few hours in (somewhere between two and four hours)---the
average memory usage dropped (and boy, did it drop) and evened out.  After
about eight, it dropped a bit more, and from there on, it remained steady. 
And this with the default GC settings in Lua (all the while there was
signicant load on the system).  And this was a consistent pattern I saw each
time I let it run for at least ten hours (usually I let it run overnight).

  I'm no longer concerned about a large number of coroutines and memory
usage.

  -spc (the graphs were quite pretty ... )

> [1] I love the libraries that are already out there. I need to more deeply
> understand concurrency and its effects, before I can use an abstraction
> that makes it "easy" for me, so I'm writing own, for now.

[2]	A SIP based service.

[3]	Most Lua networking libraries I've seen use select() to handle
	network events, but I use epoll() (Linux) or poll() (non-Linux
	systems).

[4]	I wrote my own stats collection daemon based off the etsy statd:

	https://github.com/etsy/statsd/

	The concept is simple enough that it was easier to write my own than
	to attempt to use the etsy version (based on Javascript, which we
	don't use in my department).
References:
- Interesting Lua Thread caching results, Andrew Starks
Prev by Date: Re: continuations (was Re: error() or nil, 'error msg')
Next by Date: Re: error() or nil, 'error msg'
Previous by thread: Interesting Lua Thread caching results
Next by thread: __multiindex and OP_MULTIGET
Index(es):
- Date
- Thread