lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]




On Saturday, April 5, 2014, Rena <hyperhacker@gmail.com> wrote:
On Sat, Apr 5, 2014 at 4:14 PM, Sean Conner <sean@conman.org> wrote:
It was thus said that the Great Tim Hill once stated:
>
> On Apr 5, 2014, at 10:25 AM, Rena <hyperhacker@gmail.com> wrote:
>
> > Of course one thread can't touch another thread's Lua state, since a
> > state can only be used by one thread at a time. So we need to store the
> > received messages somewhere until the receiving thread asks for them. No
> > problem, we have a struct for each thread already (holding its
> > lua_State* and other nice things); we can copy the string into a list in
> > that struct, using mutexes to ensure it's not being read and written at
> > the same time.
> >
> > The issue with that design, though, is excess copying. If thread A sends
> > a message to thread B, that means A has a copy of the string, B's
> > incoming message queue will have another copy, and B's Lua state will
> > make a third copy when lua_pushlstring() is used to receive the message.
> > I'd quite like to eliminate that extra copy.
>
> Are you really sure the copying is a significant enough problem to justify
> the complexity of your solutions? I’ve designed several marshaling systems
> around Lua, and the “extra hop” really wasn’t significant in terms of
> performance. Depending on your message volume, you may even find that a
> fixed pool of buffers is very efficient, as the CPU caches will quickly
> “warm” and the extra copy overhead will get even smaller.

  I'll second this.  At work, I'm currently writing a server [1] in Lua.
Yes, I do have some C in there, but that's to handle the sockets, the calls
to select() [2] and other "interfacing to the operating system" type calls.
The data that comes in is all text (SIP message, which follows the Internet
Text Message Format of RFC-822 for the most part) so there's quite a bit of
parsing.  I also have to deal with encoding and decoding packets for a
proprietary protocol (a mixture of binary and text).

  I've profiled the code under what I would call a moderate load and the
results have been quite interesting.  On the C side of things, the top *two
dozen* routines were all in the Lua core [3].  Profiling the code in Lua [4]
showed the main loop [5] was taking al the time; the parsing/marshalling
code was a few orders of magnitude less than the main loop.

  I also recorded the memory usage of Lua during all this (a few nearly 20
hour runs).  When I was first running tests (usually around 10 minutes,
maybe half an hour) I would get alarmed at the memory growth, and even spent
some time playing around with the garbage collection parameters.  I
shouldn't have bothered.  The extended runs, with the default settings, were
fine and memory usage was consistent (after a period of time---the graph is
somewhat wild for the first hour or two, then it settles down).

  Don't be afraid of the copies.

  -spc (Obviously, I am unafraid of the footnotes ... )

[1]     For lack of a better term.  It's actually a network component that
        receives messages, does some processing, may query another network
        component and then reply to the message.

[2]     Actually, epoll_wait() under Linux, poll() everywhere else.  I do
        have code to support select() but I don't know of any modern Unix
        system (which is what we use) that only has select().

[3]     The number one spot?  luaV_execute().  Not terribly surprising.  And
        yes, this server is using stock Lua [6].

[4]     Every 1,000,000 Lua instructions, obtain the file, function and line
        of the Lua state.  Save, and when done, print it out, sorted first
        by count (highest first) then by file:function:line number.  It's
        about 40 lines of Lua code, with about a quarter devoted to sorting
        the results.

[5]     Basically (simplified a bit, but not much)

                while not process.sig.caught() do
                  schedule_sleeping_coroutines()
                  local events = SOCKETS:events(0) -- select loop
                  for i = 1,#events do
                    events[i].obj(event[i]) -- handle packet from socket
                  end
                  execute_run_queue_of_coroutines()
                end

        Yes,
Thanks for the feedback everyone. I'm just going to go ahead and accept the potential performance hit of an extra copy in exchange for how much simpler it makes the code.

--
Sent from my Game Boy.

Nanomsg supports zero copy messaging with ipc. It also uses iocompletion ports, epoll or whatever is native to the system. My binding exports the socket/file handle (windows) in a way that co-mingles with luasocket, if needed. In our testing, we don't see it, because our clock steps I. 240th of a second chunks and it's waaaay below that. 

It's alpha but it's been very stable. That plus llthreads2 sounds like a good first prototype to attempt. 

-Andrew