[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Performance improvement in luaL_addlstring
- From: Majic <majic.one@...>
- Date: Mon, 15 Mar 2010 07:31:13 +0000
Very nice! :D
On Fri, Mar 12, 2010 at 2:17 PM, Chuck Coffing <clc@alum.mit.edu> wrote:
> Hi list,
>
> At work we use Lua in an embedded environment. It's a large project that
> does a lot of network IO. After some profiling, I discovered that
> luaL_addlstring was consuming a large amount of the processor; much of the
> usage was initiated by luasockets.
>
> It turns out that luaL_addlstring calls luaL_addchar for every byte, which
> means that multiple dereferences, a test, and a jump occur for every byte
> received over the network.
>
> The buffer, however, already knows how much space it has available, so the
> repeated tests are unnecessary. I changed luaL_addlstring to memcpy the
> largest chunk that is known to fit, and then expand the buffer as needed.
>
> For a trivial luasocket client that receives data, the change cuts the number
> of instructions executed by more than 50%. (I use valgrind to count
> instructions.) I recall (although it's been a while) that it cut the
> instruction count of our app by about 13% overall.
>
> The change only minimally increases the code size (16 bytes larger on x86):
>
> chuck@magma:~/lua-perf$ nm -S lua.orig | grep addlstring
> 08059ec0 00000068 T luaL_addlstring
> chuck@magma:~/lua-perf$ nm -S lua.perf | grep addlstring
> 08059ec0 00000078 T luaL_addlstring
>
> I made the change on 5.1.4, but it looks like 5.2 has the same performance
> issue.
>
> Patch is below; the trivial test script and output from Valgrind is also below.
>
> --
> Chuck
>
>
>
> --- lua-5.1.4.orig/src/lauxlib.c 2008-01-21 06:20:51.000000000 -0700
> +++ lua-5.1.4/src/lauxlib.c 2010-03-12 05:48:39.000000000 -0700
> @@ -434,8 +434,19 @@
>
>
> LUALIB_API void luaL_addlstring (luaL_Buffer *B, const char *s, size_t l) {
> - while (l--)
> - luaL_addchar(B, *s++);
> + while (l) {
> + size_t min;
> + size_t avail = bufffree(B);
> + if (!avail) {
> + luaL_prepbuffer(B);
> + avail = bufffree(B);
> + }
> + min = avail <= l ? avail : l;
> + memcpy(B->p, s, min);
> + B->p += min;
> + s += min;
> + l -= min;
> + }
> }
>
>
>
>
> chuck@magma:~/lua-perf$ cat echo.lua
> require "socket"
>
> server, err = socket.bind("*", 0)
> assert(server, err)
> ip, port = server:getsockname()
> print("Please telnet to localhost on port " .. port)
> client, err = server:accept()
> total = 0
> while not err do
> data, err = client:receive(4096)
> if err then break end
> client:send("receive(" .. tostring(#data) .. ")")
> total = total + #data
> if total >= 1024*1024 then break end
> end
> client:close()
> chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.orig echo.lua
> ==8565== Cachegrind, a cache and branch-prediction profiler
> ==8565== Copyright (C) 2002-2009, and GNU GPL'd, by Nicholas Nethercote et al.
> ==8565== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for copyright info
> ==8565== Command: ./lua.orig echo.lua
> ==8565==
> Please telnet to localhost on port 60342
> ==8565==
> ==8565== I refs: 16,345,890
> ==8565== I1 misses: 28,037
> ==8565== L2i misses: 2,426
> ==8565== I1 miss rate: 0.17%
> ==8565== L2i miss rate: 0.01%
> ==8565==
> ==8565== D refs: 8,229,564 (4,619,517 rd + 3,610,047 wr)
> ==8565== D1 misses: 53,970 ( 32,310 rd + 21,660 wr)
> ==8565== L2d misses: 5,991 ( 3,442 rd + 2,549 wr)
> ==8565== D1 miss rate: 0.6% ( 0.6% + 0.5% )
> ==8565== L2d miss rate: 0.0% ( 0.0% + 0.0% )
> ==8565==
> ==8565== L2 refs: 82,007 ( 60,347 rd + 21,660 wr)
> ==8565== L2 misses: 8,417 ( 5,868 rd + 2,549 wr)
> ==8565== L2 miss rate: 0.0% ( 0.0% + 0.0% )
> chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.orig echo.lua
> [...snip...]
> ==8570== I refs: 16,345,917
> [...snip...]
> chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.orig echo.lua
> [...snip...]
> ==8575== I refs: 16,345,861
> [...snip...]
> chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.perf echo.lua
> ==8580== Cachegrind, a cache and branch-prediction profiler
> ==8580== Copyright (C) 2002-2009, and GNU GPL'd, by Nicholas Nethercote et al.
> ==8580== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for copyright info
> ==8580== Command: ./lua.perf echo.lua
> ==8580==
> Please telnet to localhost on port 43713
> ==8580==
> ==8580== I refs: 7,179,931
> ==8580== I1 misses: 26,354
> ==8580== L2i misses: 2,422
> ==8580== I1 miss rate: 0.36%
> ==8580== L2i miss rate: 0.03%
> ==8580==
> ==8580== D refs: 4,563,501 (2,786,905 rd + 1,776,596 wr)
> ==8580== D1 misses: 54,462 ( 32,711 rd + 21,751 wr)
> ==8580== L2d misses: 5,989 ( 3,440 rd + 2,549 wr)
> ==8580== D1 miss rate: 1.1% ( 1.1% + 1.2% )
> ==8580== L2d miss rate: 0.1% ( 0.1% + 0.1% )
> ==8580==
> ==8580== L2 refs: 80,816 ( 59,065 rd + 21,751 wr)
> ==8580== L2 misses: 8,411 ( 5,862 rd + 2,549 wr)
> ==8580== L2 miss rate: 0.0% ( 0.0% + 0.1% )
> chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.perf echo.lua
> [...snip...]
> ==8586== I refs: 7,179,933
> [...snip...]
> chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.perf echo.lua
> [...snip...]
> ==8591== I refs: 7,179,922
> [...snip...]
> chuck@magma:~/lua-perf$
>
>