lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great Tim Hill once stated:
> 
> On Apr 5, 2014, at 10:25 AM, Rena <hyperhacker@gmail.com> wrote:
> 
> > Of course one thread can't touch another thread's Lua state, since a
> > state can only be used by one thread at a time. So we need to store the
> > received messages somewhere until the receiving thread asks for them. No
> > problem, we have a struct for each thread already (holding its
> > lua_State* and other nice things); we can copy the string into a list in
> > that struct, using mutexes to ensure it's not being read and written at
> > the same time.
> > 
> > The issue with that design, though, is excess copying. If thread A sends
> > a message to thread B, that means A has a copy of the string, B's
> > incoming message queue will have another copy, and B's Lua state will
> > make a third copy when lua_pushlstring() is used to receive the message.
> > I'd quite like to eliminate that extra copy. 
> 
> Are you really sure the copying is a significant enough problem to justify
> the complexity of your solutions? I’ve designed several marshaling systems
> around Lua, and the “extra hop” really wasn’t significant in terms of
> performance. Depending on your message volume, you may even find that a
> fixed pool of buffers is very efficient, as the CPU caches will quickly
> “warm” and the extra copy overhead will get even smaller.

  I'll second this.  At work, I'm currently writing a server [1] in Lua. 
Yes, I do have some C in there, but that's to handle the sockets, the calls
to select() [2] and other "interfacing to the operating system" type calls. 
The data that comes in is all text (SIP message, which follows the Internet
Text Message Format of RFC-822 for the most part) so there's quite a bit of
parsing.  I also have to deal with encoding and decoding packets for a
proprietary protocol (a mixture of binary and text).

  I've profiled the code under what I would call a moderate load and the
results have been quite interesting.  On the C side of things, the top *two
dozen* routines were all in the Lua core [3].  Profiling the code in Lua [4]
showed the main loop [5] was taking al the time; the parsing/marshalling
code was a few orders of magnitude less than the main loop.

  I also recorded the memory usage of Lua during all this (a few nearly 20
hour runs).  When I was first running tests (usually around 10 minutes,
maybe half an hour) I would get alarmed at the memory growth, and even spent
some time playing around with the garbage collection parameters.  I
shouldn't have bothered.  The extended runs, with the default settings, were
fine and memory usage was consistent (after a period of time---the graph is
somewhat wild for the first hour or two, then it settles down).  

  Don't be afraid of the copies.

  -spc (Obviously, I am unafraid of the footnotes ... )

[1]	For lack of a better term.  It's actually a network component that
	receives messages, does some processing, may query another network
	component and then reply to the message.

[2]	Actually, epoll_wait() under Linux, poll() everywhere else.  I do
	have code to support select() but I don't know of any modern Unix
	system (which is what we use) that only has select().

[3]	The number one spot?  luaV_execute().  Not terribly surprising.  And
	yes, this server is using stock Lua [6].

[4]	Every 1,000,000 Lua instructions, obtain the file, function and line
	of the Lua state.  Save, and when done, print it out, sorted first
	by count (highest first) then by file:function:line number.  It's
	about 40 lines of Lua code, with about a quarter devoted to sorting
	the results.

[5]	Basically (simplified a bit, but not much)

		while not process.sig.caught() do
		  schedule_sleeping_coroutines()
		  local events = SOCKETS:events(0) -- select loop
		  for i = 1,#events do
		    events[i].obj(event[i]) -- handle packet from socket
		  end
		  execute_run_queue_of_coroutines()
		end

	Yes, I create coroutines like crazy.  Each coroutine handles a
	transaction.  Sure, I have to manually yield when making a blocking
	call, but it allows the actual processing code to be pretty much:

		read_packet()
		parse_packet()
		if need_more_info()
		  create_request()
		  send_request()
		  coroutine.yield()
		  read_request_packet()
		  parse_request_packet()
		  get_data()
		end
		make_reply()
		send_packet() -- reply to original packet

	instead of a mess of callback hell.

[6]	I would love to use LuaJIT, but the target platform is Sparc, and
	LuaJIT doesn't support that architecture.  Sigh.