lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Wed, Apr 13, 2011 at 1:27 PM, Francesco Abbate
<francesco.bbt@gmail.com> wrote:
> 2011/4/13 Alexander Gladysh <agladysh@gmail.com>:
>> Did you actually profile it?
>>
>> I bet that in plain Lua version with `string.format` is slower. (One
>> extra global lookup, one extra table index, one extra function call.)
>
> hmmm... weak argument, you can do:
>
> local format = string.format
>
> to get rid of the lookups but this is a standard Lua idiom.
>
> Otherwise I don't have any benchmark and I'm not going to do it but I
> note that string.format is a Lua native function that is supposed to
> be optimized.
>
> The expression
>
> '[' .. s .. ']'
>
> implies:
> - the creation of temporary string bigger that s itself
> - the copy of the original string to the new allocated memory
> - append the additional data
>
> all this repeated two times. In general this pattern is suboptimal
> because of the repeated heap allocation and memory copying. Normally
> string.format has some chance to follow a more optimal pattern.

For the standard VM, let us look at the VM bytecode for the two options:

***************************************************************
C:\Users\Peter>luac -l -
local result, s
result = '[' .. s .. ']'
^Z

main <stdin:0,0> (5 instructions, 20 bytes at 00957E80)
0+ params, 5 slots, 0 upvalues, 2 locals, 2 constants, 0 functions
        1       [2]     LOADK           2 -1    ; "["
        2       [2]     MOVE            3 1
        3       [2]     LOADK           4 -2    ; "]"
        4       [2]     CONCAT          0 2 4
        5       [2]     RETURN          0 1

C:\Users\Peter>luac -l -
local format = string.format
local result, s
result = format("[%s]", s)
^Z

main <stdin:0,0> (9 instructions, 36 bytes at 00907E80)
0+ params, 6 slots, 0 upvalues, 3 locals, 3 constants, 0 functions
        1       [1]     GETGLOBAL       0 -1    ; string
        2       [1]     GETTABLE        0 0 -2  ; "format"
        3       [2]     LOADNIL         1 2
        4       [3]     MOVE            3 0
        5       [3]     LOADK           4 -3    ; "[%s]"
        6       [3]     MOVE            5 2
        7       [3]     CALL            3 3 2
        8       [3]     MOVE            1 3
        9       [3]     RETURN          0 1
***************************************************************

The first option does LOADK, MOVE, LOADK, CONCAT whereas the second
option does MOVE, LOADK, MOVE, CALL, MOVE. Assuming that MOVE and
LOADK both have a negligible cost, we are left to compare the CONCAT
to the CALL. The CONCAT boils down to a call to luaV_concat of 3
things, whereas the CALL invokes all the calling machinery, then boils
down to luaL_buffinit, luaL_addchar, luaL_addvalue, luaL_addchar,
luaL_pushresult. For small sizes, the auxiliary library buffer system
uses a fixed size buffer on the stack, whereas for large sizes, it'll
end up calling luaV_concat either once or twice. As for luaV_concat,
it reuses a variably sized buffer within the lua_State, so it rarely
needs to allocate a temporary buffer. Hence the conclusion that I
reach is that at least for the PUC Rio VM, the concatenation option is
better than the format option.