What motivated me to do this job is to try to improve Lua's performance and learn during the process.
About strlen.
At compile time it is possible to obtain the size of a constant string, and I believe, it is worth using this advantage.
See the luaS_new assembler:
With strlen:
inc r8
cmp BYTE PTR [r11 + r8], 0
jne SHORT $ LL19 @ luaS_new
mov rdx, r11
mov rcx, rdi
call luaS_newlstr
Without strlen:
mov r8, rsi
mov rdx, r11
mov QWORD PTR [rbx + 8], rax
mov rcx, rdi
call luaS_newlstr
I also believe that string arrays should be power of two in size, for the same reason that there is the padding of the structures.
So I think, it is worthwhile, to waste a little space, when necessary, to obtain some optimization in the calculation of addresses.
I believe that arrays (power 2) can lead to greater stability in the execution of the factorial.lua program, used in the tests.
After these changes, the peaks became smaller and sometimes with small values.
It is possible to create lua_getlfield and lua_setlfield, which make use of size (size_t), but it may be excessive.
In lua_pushlstring, it would be possible in theory to use only luaS_new.
ts = (len == 0)? luaS_new (L, "", 0): luaS_new (L, s, len);
But for some reason that I didn't investigate, it breaks some test.
So today in lua_pushlstring, some long strings don't use the cache.
It is possible to transform lua_pushstring into:
LUA_API const char * lua_pushstring (lua_State * L, const char * s) {
return lua_pushlstring (L, s, strlen (s));
}
But this change would cause a break of compatibility, since, that
when lua_pushstring is used with a NULL string, it stacks nil.
But when lua_pushlstring is used with a NULL string, it stacks "".
Attached is the patch with all the suggested changes,
I hope they help with some innovation in Lua, otherwise, thank you for the opportunity to learn.
lapi.c:
1. lua_lock (L) / lua_unlock (L) moved with the same inspiration as lua_pushlstring and auxgetstr,
in the sense that there will be no problems using the data before locking and after unlocking.
2. lua_pushlstring / lua_pushstring modified to make use of luaS_new with string size (size_t).
3. auxgetstr / auxsetstr modified to make use of luaS_new with string size (size_t).
4. lua_concat modified to use new luaS_newshrstr function that creates / reuses a new short string.
lauxlib.c:
5. findfield / pushfuncname modified to use lua_pushconstant, created to add a constant string,
without having to modify lua_pushliteral, so as not to create compatibility problems.
6. luaL_traceback, modified to add as char '\ n' and not as a string.
7. luaL_traceback / resizebox / luaL_loadfilex modified to make use of lua_addconstant, which adds a constant string to a buffer.
8. luaL_tolstring modified to make use of lua_pushconstant created to add a constant string,
without having to modify lua_pushliteral, so as not to create compatibility problems.
lauxlib.h
9. New luaL_addconstant (L, s) macro created to add constant string to a buffer.
lbaselib.c:
10. luaB_tonumber modified to test the most likely cases first, which can lead to fewer branches.
I believe that when calling to make a conversion, in most cases, the argument will not be a LUA_TNUMBER.
11. pushmode modified to use lua_pushconstant created to add a constant string.
12. luaB_collectgarbage modified to use strings that are (power 2) in size.
Also modified to make use of lua_pushconstant.
lcorolib.c:
13. auxresume modified to make use of lua_pushconstant.
ldblib.c:
14. modified hookf to use arrays that are (power 2) in size.
Removed a useless cast.
15. db_sethook / db_gethook modified to make use of lua_pushconstant created to add a constant string.
16. db_debug to use strings that are (power 2) in size.
ldebug.c
17. luaG_traceexec is an important function, because it is executed in lvm.c.
Modified to not execute instructions that are unnecessary, in the case where there are no hooks.
Note: I would like to transform it into:
int luaG_traceexec {return 0};
when it was the case to run lvm, without any debugging.
ldo.c:
18. luaD_seterrorobj modified to make use of lua_newsliteral, created
to add a short constant string.
19. resume_error modified to make use of LuaS_new with size.
liolib.c:
20. io_type / f_tostring / test_eof / io_noclose modified to make use of lua_pushconstant.
21. L_MAXLENNUM macro modified to create a string (power 2).
llex.c:
22. Modified luaX_tokens to contain the sizes of reserved words.
23. luaX_init modified to use lua_newsliteral which adds a short constant string and
luaS_new with size.
24. luaX_token2str modified to use luaX_tokens as a structure.
25. luaX_setinput modified to make use of lua_newsliteral, created
to add string constants short.
llimits.h:
26. LUAI_MAXSHORTLEN macro modified to size 64 (power 2).
27. STRCACHE_N macro modified to size 64 (power 2).
lmathlib.c:
28. modified math_type to make use of lua_pushconstant.
lmem.c:
29. luaM_malloc_ modified to test the most likely case first.
loadlib.c:
30. lsys_load / lsys_sym modified to make use of lua_pushconstant,
currently pushliteral does not make use of sizeof at compile time.
31. ll_loadlib / searcher_preload / findloader / luaopen_package modified
to make use of lua_pushconstant.
32. pusherrornotfound modified to make use of lua_addconstant.
lobject.c:
33. L_MAXLENNUM macro modified to create constant string (power 2).
34. MAXNUMBER2STR macro modified to be compatible with LUAI_MAXSHORTLEN.
35. modified tostringbuff to make use of luaS_newshrstr which adds
a short constant string.
Note: A compile-time assert is required to prevent errors.
36. BUFVFS modified to create a string (power 2).
lopcodes.h
37. GET_OPCODE macro modified to remove a useless cast,
at least for use on the lvm switch, in profiler tests,
removing the cast decreased by almost 1s the total time spent in
vmdispatch (GET_OPCODE (i))
vmdispatch (GET_OPCODE (i)) without cast:
; Line 1153
mov eax, ebx
shr edx, 7
and eax, 127; 0000007fH
movzx r10d, dl
mov r12d, r10d
shl r12, 4
lea rdi, QWORD PTR [r12 + r11]
cmp eax, 81; 00000051H
ja SHORT $ LL2 @ luaV_execu
lea r8, OFFSET FLAT: __ ImageBase
mov ecx, DWORD PTR $ LN606 @ luaV_execu [r8 + rax * 4]
add rcx, r8
jmp rcx
vmdispatch (GET_OPCODE (i)) with cast:
; Line 1153
mov eax, ebx
shr edx, 7
and eax, 127; 0000007fH
movzx r10d, dl
mov r12d, r10d
shl r12, 4
lea rdi, QWORD PTR [r12 + r11]
cmp eax, 81; 00000051H
ja SHORT $ LL2 @ luaV_execu
lea r8, OFFSET FLAT: __ ImageBase
cdqe ; instruction removed
mov ecx, DWORD PTR $ LN606 @ luaV_execu [r8 + rax * 4]
add rcx, r8
jmp rcx
lparser.c
38. undefgoto / leaveblock / gotostat / test_then_block modified to make use of lua_newsliteral, which
adds a short constant string.
39. luaY_parser modified to make use of luaS_new with size.
lstate.h
40. global_State structure modified so that tmname is array (power 2).
lstring.c:
41. luaS_hash modified to remove the len variable.
42. luaS_resize modified to test the most likely case first,
luaM_reallocvector will rarely fail.
43. luaS_init modified to make use of luaS_newsliteral
which adds a short constant string.
44. createstrobj modified to remove the totalsize variable.
also modified to store ts-> u.lnglen, which will help luaS_createlngstrobj.
45. luaS_createlngstrobj modified to use createstrobj directly.
46. internshrstr modified to luaS_newshrstr in order to be used
directly.
Note: Added assert to ensure that it is used with short strings.
memcpy moved from place to generate a better asm.
luaS_newshrstr before:
; Line 206
mov r8, rdi
mov rdx, rbp
mov rbx, rax
lea rcx, QWORD PTR [rax + 24]
call memcpy
luaS_newshrstr after:
; Line 208
mov r8, rdi
mov rdx, rbp
mov rbx, rax
mov BYTE PTR [rax + 11], dil
mov rcx, QWORD PTR [r14]
mov QWORD PTR [rax + 16], rcx
lea rcx, QWORD PTR [rax + 24]; instruction moved
call memcpy
47. luaS_newlstr modified to use luaS_newshrstr (new name).
48. luaS_new modified to accept the size (size_t) of the strings.
lstring.h
49. luaS_newsliteral created to add a short constant string.
lstrlib.c
50. str_sub / str_rep / createmetatable modified to make use of lua_pushconstant.
51. modified tonum to test the most likely cases first.
52. addquoted modified to use the result of l_sprintf which returns the length of the string.
ltable.c:
53. luaH_next modified to call gnode (t, i) only once.
54. luaH_resizearray modified to use the C99 feature, which allows
creating an array initialized with zeros.
ltablib.c:
55. checkfield modified to accept string length,
making it possible to use lua_pushlstring.
56. checktab modified to call checkfield with the length of the strings.
ltm.c:
57. luaT_init modified to use luaT_eventname as a structure.
58. luaT_init modified to use luaS_new with size.
59. luaT_objtypename modified to make use of luaS_new with size.
ltm.h:
60. TM_ARRAY_SIZE macro created to allow tmname to be an array (power 2).
moon.c:
61. multiline modified to use lua_pushconstant.
lua.h:
62. Created lua_pushconstant
63. LUA_NUMTAGS modified to make it possible to create an array (power 2).
lundump.c:
64. loadStringN modified to use luaS_newshrstr.
lvm.c:
65. l_strton modified to test the most likely cases first.
66. luaV_concat modified to use luaS_newshrstr.
lvm.h:
67. luaV_fastgeti macro modified to test the most likely cases first.
in lvm.c (OP_GETTABLE and OP_SETTABLE), and others, will be Table most of the time.
68. luaV_fastrawget created to replace luaV_fastget in (OP_GETTABLE and OP_SETTABLE).
Please do not add.
if (ttisinteger (rc) / * fast track for integers? * /
? (cast_void (n = ivalue (rc)), luaV_fastgeti (L, rb, n, slot))
: luaV_fastrawget (L, rb, rc, slot, luaH_get)) {
setobj2s (L, ra, slot);
}
luaV_fastrawget breaks, so it is not possible to optimize the second part.
regards,
Ranier Vilela