In a quick test here, where I looped a million times sorting tables
with 200 elements, the overall test in the new version took 50s and it
spent 0.45s in kernel time (whereas in previous Lua versions, of
course, 0s are spent in kernel time for the same test). The constant
back-and-forth between userspace and kernel can be seen running that
test via strace.
That's on Linux; in my experience going through syscalls on Cygwin,
for example, is way more expensive.
Isn't it possible to, alternatively, initialize a PRNG once with the
clock data when the VM loads and then produce values from that for
table.sort? (Or is producing pseudorandom numbers more expensive than
context-switching twice to get time and clock data?)