Thanks for the tip!
Seems like I got another ~4% improvement with those flags! (although I've enabled globally)
What are they doing in the interpreter? Better jumping when executing the lua instructions?
Custom allocator also benefits me on Linux, the results I've shown were all on linux.
Seems to work fine and it's much more simple to use (a single C file) and performs the same as mimalloc,
and much faster than my system's standard malloc.
I think it could be even faster if the thread safety was removed from that allocator.
All good allocators in C that I find out there are thread safe and I don't see much need in the Lua case.
I've also experimented porting the LuaJIT's allocator to Lua 5.4 (was quite easy to do),
and it performed worse than mimalloc or rpmalloc for my use case.
Regards,
Eduardo Bart