[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: sincos optimization - lua and luajit
- From: David Manura <dm.lua@...>
- Date: Sat, 23 Jul 2011 22:25:29 -0400
There are a number of occasions (e.g. rotation transformations) where
both the sine and cosine of a number need to be computed together, and
it can be more efficient to do this as a single operation [1,2]. To
take a trivial benchmark,
#include <stdio.h>
#include <math.h>
inline double f(double angle) {
double s = sin(angle);
double c = cos(angle);
return s+c;
}
int main(void) {
double i;
double sum = 0;
for (i=1; i<1e7; i++) { sum += f(i); }
printf("%e\n", sum);
return 0;
}
gcc4.4.4 (even under "-ffast-math -msse2") compiles that to a fsincos
instruction. Intel C++2011 compiles it to a ___libm_sse2_sincos call.
MSVC++2010 compiles it to separate __CIcos/__CIsin calls, fsin/fcos
under -fp:fast, or ___libm_sse2_sin/___libm_sse2_cos under -arch:SSE2.
Here's what we get with LuaJIT2:
local function f(angle)
local s = math.sin(angle)
local c = math.cos(angle)
return s + c
end
local sum = 0
for i=1,1e7 do sum = sum + f(i) end
print(sum)
0027 ------ LOOP ------------
0028 num CONV 0025 num.int
0029 num FPMATH 0028 sin
0030 num FPMATH 0028 cos
0031 num ADD 0030 0029
0032 + num ADD 0031 0024
0033 + int ADD 0025 +1
0034 > int LE 0033 +10000000
0035 int PHI 0025 0033
0036 num PHI 0024 0032
---- TRACE 1 mcode 352
->LOOP:
b77d7fb0 xorps xmm6, xmm6
b77d7fb3 cvtsi2sd xmm6, edi
b77d7fb7 movsd [esp+0x8], xmm6
b77d7fbd fld qword [esp+0x8]
b77d7fc1 fsin
b77d7fc3 fstp qword [esp]
b77d7fc6 movsd xmm5, [esp]
b77d7fcb fld qword [esp+0x8]
b77d7fcf fcos
b77d7fd1 fstp qword [esp]
b77d7fd4 movsd xmm6, [esp]
b77d7fd9 addsd xmm6, xmm5
b77d7fdd addsd xmm7, xmm6
b77d7fe1 add edi, +0x01
b77d7fe4 cmp edi, 0x00989680
b77d7fea jle 0xb77d7fb0 ->LOOP
b77d7fec jmp 0xb77d0014 ->3
---- TRACE 1 stop -> loop
I suppose the optimizer could recognize the adjacent sin/cos calls in
the IR and merge them to fsincos. If compiling sincos to SSE2, you
might need a library like http://gruntthepeon.free.fr/ssemath/ .
This all doesn't seem to make a whole lot of difference though.
___libm_sse2_sincos is actually a little slower than the fsincos here
and the speedup is only maybe 30% than with the separate fsin/fcos
instructions, but it depends on your library implementation and its
accuracy level. It may make a bit more difference in standard Lua,
and the lqd binding has one [3]. Even Lua has the somewhat related
math.atan2, though not for the same reasons. Here's an example of it
added to lmathlib.c:
static int math_sincos (lua_State *L) {
lua_Number x = luaL_checknumber(L, 1);
lua_pushnumber(L, l_tg(sin)(x));
lua_pushnumber(L, l_tg(cos)(x));
return 2;
}
[1] http://linux.die.net/man/3/sincos
[2] http://stackoverflow.com/questions/2683588/what-is-the-fastest-way-to-compute-sin-and-cos-together
[3] http://lua-users.org/lists/lua-l/2009-04/msg00143.html