lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Very nice! :D

On Fri, Mar 12, 2010 at 2:17 PM, Chuck Coffing <clc@alum.mit.edu> wrote:
> Hi list,
>
> At work we use Lua in an embedded environment.  It's a large project that
> does a lot of network IO.  After some profiling, I discovered that
> luaL_addlstring was consuming a large amount of the processor; much of the
> usage was initiated by luasockets.
>
> It turns out that luaL_addlstring calls luaL_addchar for every byte, which
> means that multiple dereferences, a test, and a jump occur for every byte
> received over the network.
>
> The buffer, however, already knows how much space it has available, so the
> repeated tests are unnecessary.  I changed luaL_addlstring to memcpy the
> largest chunk that is known to fit, and then expand the buffer as needed.
>
> For a trivial luasocket client that receives data, the change cuts the number
> of instructions executed by more than 50%.  (I use valgrind to count
> instructions.)  I recall (although it's been a while) that it cut the
> instruction count of our app by about 13% overall.
>
> The change only minimally increases the code size (16 bytes larger on x86):
>
> chuck@magma:~/lua-perf$ nm -S lua.orig | grep addlstring
> 08059ec0 00000068 T luaL_addlstring
> chuck@magma:~/lua-perf$ nm -S lua.perf | grep addlstring
> 08059ec0 00000078 T luaL_addlstring
>
> I made the change on 5.1.4, but it looks like 5.2 has the same performance
> issue.
>
> Patch is below; the trivial test script and output from Valgrind is also below.
>
> --
> Chuck
>
>
>
> --- lua-5.1.4.orig/src/lauxlib.c    2008-01-21 06:20:51.000000000 -0700
> +++ lua-5.1.4/src/lauxlib.c 2010-03-12 05:48:39.000000000 -0700
> @@ -434,8 +434,19 @@
>
>
>  LUALIB_API void luaL_addlstring (luaL_Buffer *B, const char *s, size_t l) {
> -  while (l--)
> -    luaL_addchar(B, *s++);
> +  while (l) {
> +    size_t min;
> +    size_t avail = bufffree(B);
> +    if (!avail) {
> +      luaL_prepbuffer(B);
> +      avail = bufffree(B);
> +    }
> +    min = avail <= l ? avail : l;
> +    memcpy(B->p, s, min);
> +    B->p += min;
> +    s += min;
> +    l -= min;
> +  }
>  }
>
>
>
>
> chuck@magma:~/lua-perf$ cat echo.lua
> require "socket"
>
> server, err = socket.bind("*", 0)
> assert(server, err)
> ip, port = server:getsockname()
> print("Please telnet to localhost on port " .. port)
> client, err = server:accept()
> total = 0
> while not err do
>  data, err = client:receive(4096)
>  if err then break end
>  client:send("receive(" .. tostring(#data) .. ")")
>  total = total + #data
>  if total >= 1024*1024 then break end
> end
> client:close()
> chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.orig echo.lua
> ==8565== Cachegrind, a cache and branch-prediction profiler
> ==8565== Copyright (C) 2002-2009, and GNU GPL'd, by Nicholas Nethercote et al.
> ==8565== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for copyright info
> ==8565== Command: ./lua.orig echo.lua
> ==8565==
> Please telnet to localhost on port 60342
> ==8565==
> ==8565== I   refs:      16,345,890
> ==8565== I1  misses:        28,037
> ==8565== L2i misses:         2,426
> ==8565== I1  miss rate:       0.17%
> ==8565== L2i miss rate:       0.01%
> ==8565==
> ==8565== D   refs:       8,229,564  (4,619,517 rd   + 3,610,047 wr)
> ==8565== D1  misses:        53,970  (   32,310 rd   +    21,660 wr)
> ==8565== L2d misses:         5,991  (    3,442 rd   +     2,549 wr)
> ==8565== D1  miss rate:        0.6% (      0.6%     +       0.5%  )
> ==8565== L2d miss rate:        0.0% (      0.0%     +       0.0%  )
> ==8565==
> ==8565== L2 refs:           82,007  (   60,347 rd   +    21,660 wr)
> ==8565== L2 misses:          8,417  (    5,868 rd   +     2,549 wr)
> ==8565== L2 miss rate:         0.0% (      0.0%     +       0.0%  )
> chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.orig echo.lua
> [...snip...]
> ==8570== I   refs:      16,345,917
> [...snip...]
> chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.orig echo.lua
> [...snip...]
> ==8575== I   refs:      16,345,861
> [...snip...]
> chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.perf echo.lua
> ==8580== Cachegrind, a cache and branch-prediction profiler
> ==8580== Copyright (C) 2002-2009, and GNU GPL'd, by Nicholas Nethercote et al.
> ==8580== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for copyright info
> ==8580== Command: ./lua.perf echo.lua
> ==8580==
> Please telnet to localhost on port 43713
> ==8580==
> ==8580== I   refs:      7,179,931
> ==8580== I1  misses:       26,354
> ==8580== L2i misses:        2,422
> ==8580== I1  miss rate:      0.36%
> ==8580== L2i miss rate:      0.03%
> ==8580==
> ==8580== D   refs:      4,563,501  (2,786,905 rd   + 1,776,596 wr)
> ==8580== D1  misses:       54,462  (   32,711 rd   +    21,751 wr)
> ==8580== L2d misses:        5,989  (    3,440 rd   +     2,549 wr)
> ==8580== D1  miss rate:       1.1% (      1.1%     +       1.2%  )
> ==8580== L2d miss rate:       0.1% (      0.1%     +       0.1%  )
> ==8580==
> ==8580== L2 refs:          80,816  (   59,065 rd   +    21,751 wr)
> ==8580== L2 misses:         8,411  (    5,862 rd   +     2,549 wr)
> ==8580== L2 miss rate:        0.0% (      0.0%     +       0.1%  )
> chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.perf echo.lua
> [...snip...]
> ==8586== I   refs:      7,179,933
> [...snip...]
> chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.perf echo.lua
> [...snip...]
> ==8591== I   refs:      7,179,922
> [...snip...]
> chuck@magma:~/lua-perf$
>
>