|
It was thus said that the Great Tim Hill once stated:
I'll second this. At work, I'm currently writing a server [1] in Lua.>
> On Apr 5, 2014, at 10:25 AM, Rena <hyperhacker@gmail.com> wrote:
>
> > Of course one thread can't touch another thread's Lua state, since a
> > state can only be used by one thread at a time. So we need to store the
> > received messages somewhere until the receiving thread asks for them. No
> > problem, we have a struct for each thread already (holding its
> > lua_State* and other nice things); we can copy the string into a list in
> > that struct, using mutexes to ensure it's not being read and written at
> > the same time.
> >
> > The issue with that design, though, is excess copying. If thread A sends
> > a message to thread B, that means A has a copy of the string, B's
> > incoming message queue will have another copy, and B's Lua state will
> > make a third copy when lua_pushlstring() is used to receive the message.
> > I'd quite like to eliminate that extra copy.
>
> Are you really sure the copying is a significant enough problem to justify
> the complexity of your solutions? I’ve designed several marshaling systems
> around Lua, and the “extra hop” really wasn’t significant in terms of
> performance. Depending on your message volume, you may even find that a
> fixed pool of buffers is very efficient, as the CPU caches will quickly
> “warm” and the extra copy overhead will get even smaller.
Yes, I do have some C in there, but that's to handle the sockets, the calls
to select() [2] and other "interfacing to the operating system" type calls.
The data that comes in is all text (SIP message, which follows the Internet
Text Message Format of RFC-822 for the most part) so there's quite a bit of
parsing. I also have to deal with encoding and decoding packets for a
proprietary protocol (a mixture of binary and text).
I've profiled the code under what I would call a moderate load and the
results have been quite interesting. On the C side of things, the top *two
dozen* routines were all in the Lua core [3]. Profiling the code in Lua [4]
showed the main loop [5] was taking al the time; the parsing/marshalling
code was a few orders of magnitude less than the main loop.
I also recorded the memory usage of Lua during all this (a few nearly 20
hour runs). When I was first running tests (usually around 10 minutes,
maybe half an hour) I would get alarmed at the memory growth, and even spent
some time playing around with the garbage collection parameters. I
shouldn't have bothered. The extended runs, with the default settings, were
fine and memory usage was consistent (after a period of time---the graph is
somewhat wild for the first hour or two, then it settles down).
Don't be afraid of the copies.
-spc (Obviously, I am unafraid of the footnotes ... )
[1] For lack of a better term. It's actually a network component that
receives messages, does some processing, may query another network
component and then reply to the message.
[2] Actually, epoll_wait() under Linux, poll() everywhere else. I do
have code to support select() but I don't know of any modern Unix
system (which is what we use) that only has select().
[3] The number one spot? luaV_execute(). Not terribly surprising. And
yes, this server is using stock Lua [6].
[4] Every 1,000,000 Lua instructions, obtain the file, function and line
of the Lua state. Save, and when done, print it out, sorted first
by count (highest first) then by file:function:line number. It's
about 40 lines of Lua code, with about a quarter devoted to sorting
the results.
[5] Basically (simplified a bit, but not much)
while not process.sig.caught() do
schedule_sleeping_coroutines()
local events = SOCKETS:events(0) -- select loop
for i = 1,#events do
events[i].obj(event[i]) -- handle packet from socket
end
execute_run_queue_of_coroutines()
end
Yes, I create coroutines like crazy. Each coroutine handles a
transaction. Sure, I have to manually yield when making a blocking
call, but it allows the actual processing code to be pretty much:
read_packet()
parse_packet()
if need_more_info()
create_request()
send_request()
coroutine.yield()
read_request_packet()
parse_request_packet()
get_data()
end
make_reply()
send_packet() -- reply to original packet
instead of a mess of callback hell.
[6] I would love to use LuaJIT, but the target platform is Sparc, and
LuaJIT doesn't support that architecture. Sigh.