[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Lua (small optimizations)
- From: Andrew Gierth <andrew@...>
- Date: Sun, 13 Dec 2020 15:00:26 +0000
>>>>> "Ranier" == Ranier Vilela <firstname.lastname@example.org> writes:
Ranier> What motivated me to do this job is to try to improve Lua's
Ranier> performance and learn during the process.
Ranier> About strlen.
Ranier> At compile time it is possible to obtain the size of a constant
Ranier> string, and I believe, it is worth using this advantage.
The assumption is that any constant string where you care about the
performance of lua_pushstring will already be in the string cache and
therefore strlen will never be called.
Unless you can show relevant statistics about the miss rate of the
string cache, all this messing with constant strings is entirely
Ranier> I also believe that string arrays should be power of two in
Ranier> size, for the same reason that there is the padding of the
If you're referring to the string cache, the N dimension is set to be a
prime number, for good reason - using a power of two for the modulus is
extremely likely to produce collisions on the cache.
The loop over M is a good candidate to be unrolled anyway, so there are
no significant benefits here.
Ranier> It is possible to create lua_getlfield and lua_setlfield, which
Ranier> make use of size (size_t), but it may be excessive.
Again, this is why there's a string cache.
Ranier> In lua_pushlstring, it would be possible in theory to use only
The use of luaS_newlstr is to bypass the cache, so it's intentional. The
cache is set up in such a way that it only works for strings that are
known not to contain \0.
Ranier> 10. luaB_tonumber modified to test the most likely cases first,
Ranier> which can lead to fewer branches.
Two compilers I tested (clang 10 and gcc 10) disagree with you about how
the order of the resulting code actually goes.
Ranier> 12. luaB_collectgarbage modified to use strings that are (power
Ranier> 2) in size.
Adding an extra NULL to make the array 12 elements rather than 11 has no
particular benefit here. (What _would_ be nice is replacing checkoption
with something that doesn't need to use an array of pointers to constant
strings, which is a known irritant because of the relocation
Ranier> 14. modified hookf to use arrays that are (power 2) in size.
Ranier> 16. db_debug to use strings that are (power 2) in size.
Ranier> 17. luaG_traceexec is an important function, because it is
Ranier> executed in lvm.c. Modified to not execute instructions that
Ranier> are unnecessary, in the case where there are no hooks.
You seem to have missed the fact that lvm.c does not even call
luaG_traceexec AT ALL unless trap is true, which it is not if there are
no hooks. The early "return 0;" case in traceexec is only hit when hooks
have just been turned off, so it's not a hot path.
Ranier> 21. L_MAXLENNUM macro modified to create a string (power 2).
Ranier> 22. Modified luaX_tokens to contain the sizes of reserved words.
It would be nicer to make that use something other than an array of
pointers to literals, to save on relocations when built as a shared
Ranier> 26. LUAI_MAXSHORTLEN macro modified to size 64 (power 2).
There is no reason at all to believe that powers of 2 are beneficial
here. The division between long and short strings should be based on
whether interning them is an overall benefit.
Ranier> 27. STRCACHE_N macro modified to size 64 (power 2).
As noted above, using a power of 2 here is actively bad. This is a cache
hashed by modulus of the pointer value, after all.
Ranier> 29. luaM_malloc_ modified to test the most likely case first.
Again, compilers disagree with you about what to do about these cases.
Ranier> 33. L_MAXLENNUM macro modified to create constant string (power 2).
Ranier> 34. MAXNUMBER2STR macro modified to be compatible with
This is just wrong. MAXNUMBER2STR should be no longer than is needed to
accomodate the actual result of converting a number to a string. There
is no reason why it should have anything to do with short strings.
Ranier> 36. BUFVFS modified to create a string (power 2).
Ranier> 37. GET_OPCODE macro modified to remove a useless cast,
Ranier> at least for use on the lvm switch, in profiler tests,
Ranier> removing the cast decreased by almost 1s the total time spent in
Ranier> vmdispatch (GET_OPCODE (i))
The performance difference here is probably spurious (the instruction
being removed is certainly spurious, too, there is no reason for it to
be there even with the cast).
Ranier> 40. global_State structure modified so that tmname is array
Ranier> (power 2).
Ranier> 42. luaS_resize modified to test the most likely case first,
Ranier> luaM_reallocvector will rarely fail.
Again, reordering the source code does not help with this. If a
condition genuinely is unlikely, that's what unlikely() is for.
By removing that, you probably made things much worse.
Ranier> memcpy moved from place to generate a better asm.
I don't see why you think it is any better.
Ranier> 54. luaH_resizearray modified to use the C99 feature, which
Ranier> allows creating an array initialized with zeros.
On my compiler, this generates the exact same code. Note that Lua is
expected to compile as c89, even though this is not the default.