I tried to replace LuaJIT 2.0.5 lib_math.c with xoshiro256**, but did not see any math.random speedup (against Tausworthe)
By swapping s[0] and s[1], and scramble with updated state, it matched LuaJIT Tausworthe speed. (Note: it is still xoshiro256**)
LuaJIT, 1 billion math.random(), best of 3
49.2s Tausworthe: LuaJIT stock PRNG 50.1s. Xoshiro256**: http://xoshiro.di.unimi.it/xoshiro256starstar.c 48.9s. Xoshiro256** (modified), lj_math_random_step() patch below
static inline uint64_t rotl(const uint64_t x, int k) { return (x << k) | (x >> (64 - k)); }
LJ_NOINLINE uint64_t LJ_FASTCALL lj_math_random_step(RandomState *rs) { uint64_t *s = rs->gen; /* modifed xoshiro256** */ uint64_t t = s[0] << 17; /* s[0], s[1] swapped */ s[2] ^= s[1]; s[3] ^= s[0]; s[0] ^= s[2]; s[1] ^= s[3]; s[2] ^= t; s[3] = rotl(s[3], 45); uint64_t r = rotl(s[0] * 5, 7) * 9; /* scramble updated state */ return (r & U64x(000fffff,ffffffff)) | U64x(3ff00000,00000000); }
Vigna recommendation for xoshiro256** random double:
> There's no detectable difference between the bits. Theoretically, however, > the upper bits have a slightly higher linear complexity, so if you don't have > any other criterion I'd say to use the high bits.
> > -- Vigna 5/9/2018
So, maybe better patch is to use the high bits:
< return (r & U64x(000fffff,ffffffff)) | U64x(3ff00000,00000000); > return (r >> 12) | U64x(3ff00000,00000000);
|