[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Understanding 'perf report' result lua 5.2.3: __memcpy_sse2_unaligned ?
- From: "Karsten Schulz" <kahnpost@...>
- Date: Tue, 9 Dec 2014 16:45:32 +0100
SSE optimized memcpy not faster
From: Valerio Schiavoni
Sent: Tuesday, December 09, 2014 4:36 PM
To: Lua mailing list
Subject: Re: Understanding 'perf report' result lua 5.2.3:
thanks for your explanation.
On Tue, Dec 9, 2014 at 3:36 PM, Roberto Ierusalimschy
What is it happening that triggers that many '__memcpy_sse2_unaligned' ?
If I understood the report correctly, there is no indication that there
are too many '__memcpy_sse2_unaligned'; it is big only in comparison
with the rest. If all your server does is to move data around (e.g.,
it reads it from somewhere, creates a Lua string with it, and then writes
it somewhere else),
Well, in my test-case, this is all the server does:
local data = clientsocket:receive(payload_size)
As you see, the data is read/received from a (non-blocking) LuaSocket
and then simply ignored until the end of the function.
On a 1Gbs-network, this single call to receive takes an average of 5.3
seconds when the payload_size is big (128MB).
Should I think that it takes sometime for LuaSocket binding to copy
the received data back into the stack (somewhere here