[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Understanding 'perf report' result lua 5.2.3: __memcpy_sse2_unaligned ?
- From: "Karsten Schulz" <kahnpost@...>
- Date: Tue, 9 Dec 2014 16:45:32 +0100
SSE optimized memcpy not faster
http://software.intel.com/en-us/forums/topic/475426
Greetings
Karsten
http://flexxvision.de/luascript.html
-----Ursprüngliche Nachricht-----
From: Valerio Schiavoni
Sent: Tuesday, December 09, 2014 4:36 PM
To: Lua mailing list
Subject: Re: Understanding 'perf report' result lua 5.2.3:
__memcpy_sse2_unaligned ?
Hello Roberto,
thanks for your explanation.
On Tue, Dec 9, 2014 at 3:36 PM, Roberto Ierusalimschy
<roberto@inf.puc-rio.br> wrote:
What is it happening that triggers that many '__memcpy_sse2_unaligned' ?
If I understood the report correctly, there is no indication that there
are too many '__memcpy_sse2_unaligned'; it is big only in comparison
with the rest. If all your server does is to move data around (e.g.,
it reads it from somewhere, creates a Lua string with it, and then writes
it somewhere else),
Well, in my test-case, this is all the server does:
local data = clientsocket:receive(payload_size)
https://gist.github.com/vschiavoni/315af2d2ea91876506a2#file-webserver_splay-lua-L18
As you see, the data is read/received from a (non-blocking) LuaSocket
and then simply ignored until the end of the function.
On a 1Gbs-network, this single call to receive takes an average of 5.3
seconds when the payload_size is big (128MB).
Should I think that it takes sometime for LuaSocket binding to copy
the received data back into the stack (somewhere here
https://github.com/diegonehab/luasocket/blob/master/src/buffer.c#L136
) ?
best,
valerio