Noticed that OP_CONCAT and LoadString in undump.c uses memcpy() twice if the resulting string is not already interned. Since the result length is known early the long string case can be quite easily optimized to copy data just once. Seems to slash 25+25 concat time by around 12%, probably more for longer strings. I see no reproducible regression in the short strings case.
$ cat testcase.lua
local a = string.rep('A', 25)
local b = string.rep('B', 25)
for i = 1, 1e8 do local c = a..b end
$ time ./lua-5.3.1 testcase.lua
$ time ./lua-patched testcase.lua
Please, can anyone verify the correctness of the patch or run it against a real test suite? Thanks!