Re: Understanding 'perf report' result lua 5.2.3: __memcpy_sse2

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Understanding 'perf report' result lua 5.2.3: __memcpy_sse2_unaligned ?
From: "Karsten Schulz" <kahnpost@...>
Date: Tue, 9 Dec 2014 16:45:32 +0100

SSE optimized memcpy not faster

http://software.intel.com/en-us/forums/topic/475426


Greetings
Karsten

http://flexxvision.de/luascript.html

-----Ursprüngliche Nachricht-----From: Valerio Schiavoni

Sent: Tuesday, December 09, 2014 4:36 PM
To: Lua mailing list

Subject: Re: Understanding 'perf report' result lua 5.2.3:__memcpy_sse2_unaligned ?


Hello Roberto,
thanks for your explanation.

On Tue, Dec 9, 2014 at 3:36 PM, Roberto Ierusalimschy
<roberto@inf.puc-rio.br> wrote:

What is it happening that triggers that many '__memcpy_sse2_unaligned' ?

If I understood the report correctly, there is no indication that there
are too many '__memcpy_sse2_unaligned'; it is big only in comparison
with the rest. If all your server does is to move data around (e.g.,
it reads it from somewhere, creates a Lua string with it, and then writes
it somewhere else),


Well, in my test-case, this is all the server does:

local data = clientsocket:receive(payload_size)

https://gist.github.com/vschiavoni/315af2d2ea91876506a2#file-webserver_splay-lua-L18

As you see, the data is read/received from a (non-blocking) LuaSocket
and then simply ignored until the end of the function.

On a 1Gbs-network, this single call to receive takes an average of 5.3
seconds when the payload_size is big (128MB).
Should I think that it takes sometime for  LuaSocket binding to copy
the received data back into the stack (somewhere here
https://github.com/diegonehab/luasocket/blob/master/src/buffer.c#L136
) ?


best,
valerio

References:
- Understanding 'perf report' result lua 5.2.3: __memcpy_sse2_unaligned ?, Valerio Schiavoni
- Re: Understanding 'perf report' result lua 5.2.3: __memcpy_sse2_unaligned ?, Roberto Ierusalimschy
- Re: Understanding 'perf report' result lua 5.2.3: __memcpy_sse2_unaligned ?, Valerio Schiavoni

Prev by Date: Re: Understanding 'perf report' result lua 5.2.3: __memcpy_sse2_unaligned ?
Next by Date: Re: Understanding 'perf report' result lua 5.2.3: __memcpy_sse2_unaligned ?
Previous by thread: Re: Understanding 'perf report' result lua 5.2.3: __memcpy_sse2_unaligned ?
Next by thread: Re: Understanding 'perf report' result lua 5.2.3: __memcpy_sse2_unaligned ?
Index(es):
- Date
- Thread