Performance improvement in luaL

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Performance improvement in luaL_addlstring
From: Chuck Coffing <clc@...>
Date: Fri, 12 Mar 2010 07:17:27 -0700

Hi list,

At work we use Lua in an embedded environment.  It's a large project that
does a lot of network IO.  After some profiling, I discovered that
luaL_addlstring was consuming a large amount of the processor; much of the
usage was initiated by luasockets.

It turns out that luaL_addlstring calls luaL_addchar for every byte, which
means that multiple dereferences, a test, and a jump occur for every byte
received over the network.

The buffer, however, already knows how much space it has available, so the
repeated tests are unnecessary.  I changed luaL_addlstring to memcpy the
largest chunk that is known to fit, and then expand the buffer as needed.

For a trivial luasocket client that receives data, the change cuts the number
of instructions executed by more than 50%.  (I use valgrind to count
instructions.)  I recall (although it's been a while) that it cut the
instruction count of our app by about 13% overall.

The change only minimally increases the code size (16 bytes larger on x86):

chuck@magma:~/lua-perf$ nm -S lua.orig | grep addlstring
08059ec0 00000068 T luaL_addlstring
chuck@magma:~/lua-perf$ nm -S lua.perf | grep addlstring
08059ec0 00000078 T luaL_addlstring

I made the change on 5.1.4, but it looks like 5.2 has the same performance
issue.

Patch is below; the trivial test script and output from Valgrind is also below.

-- 
Chuck



--- lua-5.1.4.orig/src/lauxlib.c    2008-01-21 06:20:51.000000000 -0700
+++ lua-5.1.4/src/lauxlib.c 2010-03-12 05:48:39.000000000 -0700
@@ -434,8 +434,19 @@
 
 
 LUALIB_API void luaL_addlstring (luaL_Buffer *B, const char *s, size_t l) {
-  while (l--)
-    luaL_addchar(B, *s++);
+  while (l) {
+    size_t min;
+    size_t avail = bufffree(B);
+    if (!avail) {
+      luaL_prepbuffer(B);
+      avail = bufffree(B);
+    }
+    min = avail <= l ? avail : l;
+    memcpy(B->p, s, min);
+    B->p += min;
+    s += min;
+    l -= min;
+  }
 }
 



chuck@magma:~/lua-perf$ cat echo.lua 
require "socket"

server, err = socket.bind("*", 0)
assert(server, err)
ip, port = server:getsockname()
print("Please telnet to localhost on port " .. port)
client, err = server:accept()
total = 0
while not err do
  data, err = client:receive(4096)
  if err then break end
  client:send("receive(" .. tostring(#data) .. ")")
  total = total + #data
  if total >= 1024*1024 then break end
end
client:close()
chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.orig echo.lua 
==8565== Cachegrind, a cache and branch-prediction profiler
==8565== Copyright (C) 2002-2009, and GNU GPL'd, by Nicholas Nethercote et al.
==8565== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for copyright info
==8565== Command: ./lua.orig echo.lua
==8565== 
Please telnet to localhost on port 60342
==8565== 
==8565== I   refs:      16,345,890
==8565== I1  misses:        28,037
==8565== L2i misses:         2,426
==8565== I1  miss rate:       0.17%
==8565== L2i miss rate:       0.01%
==8565== 
==8565== D   refs:       8,229,564  (4,619,517 rd   + 3,610,047 wr)
==8565== D1  misses:        53,970  (   32,310 rd   +    21,660 wr)
==8565== L2d misses:         5,991  (    3,442 rd   +     2,549 wr)
==8565== D1  miss rate:        0.6% (      0.6%     +       0.5%  )
==8565== L2d miss rate:        0.0% (      0.0%     +       0.0%  )
==8565== 
==8565== L2 refs:           82,007  (   60,347 rd   +    21,660 wr)
==8565== L2 misses:          8,417  (    5,868 rd   +     2,549 wr)
==8565== L2 miss rate:         0.0% (      0.0%     +       0.0%  )
chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.orig echo.lua 
[...snip...]
==8570== I   refs:      16,345,917
[...snip...]
chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.orig echo.lua 
[...snip...]
==8575== I   refs:      16,345,861
[...snip...]
chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.perf echo.lua 
==8580== Cachegrind, a cache and branch-prediction profiler
==8580== Copyright (C) 2002-2009, and GNU GPL'd, by Nicholas Nethercote et al.
==8580== Using Valgrind-3.5.0-Debian and LibVEX; rerun with -h for copyright info
==8580== Command: ./lua.perf echo.lua
==8580== 
Please telnet to localhost on port 43713
==8580== 
==8580== I   refs:      7,179,931
==8580== I1  misses:       26,354
==8580== L2i misses:        2,422
==8580== I1  miss rate:      0.36%
==8580== L2i miss rate:      0.03%
==8580== 
==8580== D   refs:      4,563,501  (2,786,905 rd   + 1,776,596 wr)
==8580== D1  misses:       54,462  (   32,711 rd   +    21,751 wr)
==8580== L2d misses:        5,989  (    3,440 rd   +     2,549 wr)
==8580== D1  miss rate:       1.1% (      1.1%     +       1.2%  )
==8580== L2d miss rate:       0.1% (      0.1%     +       0.1%  )
==8580== 
==8580== L2 refs:          80,816  (   59,065 rd   +    21,751 wr)
==8580== L2 misses:         8,411  (    5,862 rd   +     2,549 wr)
==8580== L2 miss rate:        0.0% (      0.0%     +       0.1%  )
chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.perf echo.lua 
[...snip...]
==8586== I   refs:      7,179,933
[...snip...]
chuck@magma:~/lua-perf$ valgrind --tool=cachegrind ./lua.perf echo.lua 
[...snip...]
==8591== I   refs:      7,179,922
[...snip...]
chuck@magma:~/lua-perf$

Follow-Ups:
- Re: Performance improvement in luaL_addlstring, Majic
- Re: Performance improvement in luaL_addlstring, Leo Razoumov

Prev by Date: exception when calling lua with iup
Next by Date: Current best practice for C++ exception handling
Previous by thread: RE: exception when calling lua with iup
Next by thread: Re: Performance improvement in luaL_addlstring
Index(es):
- Date
- Thread