[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LuaJIT string concat
- From: Peter Cawley <lua@...>
- Date: Wed, 13 Apr 2011 13:43:20 +0100
On Wed, Apr 13, 2011 at 1:27 PM, Francesco Abbate
<francesco.bbt@gmail.com> wrote:
> 2011/4/13 Alexander Gladysh <agladysh@gmail.com>:
>> Did you actually profile it?
>>
>> I bet that in plain Lua version with `string.format` is slower. (One
>> extra global lookup, one extra table index, one extra function call.)
>
> hmmm... weak argument, you can do:
>
> local format = string.format
>
> to get rid of the lookups but this is a standard Lua idiom.
>
> Otherwise I don't have any benchmark and I'm not going to do it but I
> note that string.format is a Lua native function that is supposed to
> be optimized.
>
> The expression
>
> '[' .. s .. ']'
>
> implies:
> - the creation of temporary string bigger that s itself
> - the copy of the original string to the new allocated memory
> - append the additional data
>
> all this repeated two times. In general this pattern is suboptimal
> because of the repeated heap allocation and memory copying. Normally
> string.format has some chance to follow a more optimal pattern.
For the standard VM, let us look at the VM bytecode for the two options:
***************************************************************
C:\Users\Peter>luac -l -
local result, s
result = '[' .. s .. ']'
^Z
main <stdin:0,0> (5 instructions, 20 bytes at 00957E80)
0+ params, 5 slots, 0 upvalues, 2 locals, 2 constants, 0 functions
1 [2] LOADK 2 -1 ; "["
2 [2] MOVE 3 1
3 [2] LOADK 4 -2 ; "]"
4 [2] CONCAT 0 2 4
5 [2] RETURN 0 1
C:\Users\Peter>luac -l -
local format = string.format
local result, s
result = format("[%s]", s)
^Z
main <stdin:0,0> (9 instructions, 36 bytes at 00907E80)
0+ params, 6 slots, 0 upvalues, 3 locals, 3 constants, 0 functions
1 [1] GETGLOBAL 0 -1 ; string
2 [1] GETTABLE 0 0 -2 ; "format"
3 [2] LOADNIL 1 2
4 [3] MOVE 3 0
5 [3] LOADK 4 -3 ; "[%s]"
6 [3] MOVE 5 2
7 [3] CALL 3 3 2
8 [3] MOVE 1 3
9 [3] RETURN 0 1
***************************************************************
The first option does LOADK, MOVE, LOADK, CONCAT whereas the second
option does MOVE, LOADK, MOVE, CALL, MOVE. Assuming that MOVE and
LOADK both have a negligible cost, we are left to compare the CONCAT
to the CALL. The CONCAT boils down to a call to luaV_concat of 3
things, whereas the CALL invokes all the calling machinery, then boils
down to luaL_buffinit, luaL_addchar, luaL_addvalue, luaL_addchar,
luaL_pushresult. For small sizes, the auxiliary library buffer system
uses a fixed size buffer on the stack, whereas for large sizes, it'll
end up calling luaV_concat either once or twice. As for luaV_concat,
it reuses a variably sized buffer within the lua_State, so it rarely
needs to allocate a temporary buffer. Hence the conclusion that I
reach is that at least for the PUC Rio VM, the concatenation option is
better than the format option.