Re: sincos optimization - lua and luajit

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: sincos optimization - lua and luajit
From: "Dimiter \"malkia\" Stanev" <malkia@...>
Date: Sat, 23 Jul 2011 20:08:10 -0700

I'd rather have them exposed as one function sincos() or cossin()returning two numbers.

How would luajit handle the fact if math.cos is something else? I dunno,I'm not Mike, but I have feeling that more "atomic" representations(such as cossin, or sincos might be more doable, and they could be rightnow). I guess if I want to add this in luajit, I'll just find a functionthat returns two things and copycat it.


On 7/23/11 7:25 PM, David Manura wrote:

There are a number of occasions (e.g. rotation transformations) where
both the sine and cosine of a number need to be computed together, and
it can be more efficient to do this as a single operation [1,2].  To
take a trivial benchmark,

#include<stdio.h>
#include<math.h>
inline double f(double angle) {
	double s = sin(angle);
	double c = cos(angle);
	return s+c;
}
int main(void) {
	double i;
	double sum = 0;
	for (i=1; i<1e7; i++) { sum += f(i); }
	printf("%e\n", sum);
	return 0;
}

gcc4.4.4 (even under "-ffast-math -msse2") compiles that to a fsincos
instruction.  Intel C++2011 compiles it to a ___libm_sse2_sincos call.
  MSVC++2010 compiles it to separate __CIcos/__CIsin calls, fsin/fcos
under -fp:fast, or ___libm_sse2_sin/___libm_sse2_cos under -arch:SSE2.

Here's what we get with LuaJIT2:

   local function f(angle)
     local s = math.sin(angle)
     local c = math.cos(angle)
     return s + c
   end
   local sum = 0
   for i=1,1e7 do sum = sum + f(i) end
   print(sum)

   0027 ------ LOOP ------------
   0028    num CONV   0025  num.int
   0029    num FPMATH 0028  sin
   0030    num FPMATH 0028  cos
   0031    num ADD    0030  0029
   0032  + num ADD    0031  0024
   0033  + int ADD    0025  +1
   0034>   int LE     0033  +10000000
   0035    int PHI    0025  0033
   0036    num PHI    0024  0032
   ---- TRACE 1 mcode 352

   ->LOOP:
   b77d7fb0  xorps xmm6, xmm6
   b77d7fb3  cvtsi2sd xmm6, edi
   b77d7fb7  movsd [esp+0x8], xmm6
   b77d7fbd  fld qword [esp+0x8]
   b77d7fc1  fsin
   b77d7fc3  fstp qword [esp]
   b77d7fc6  movsd xmm5, [esp]
   b77d7fcb  fld qword [esp+0x8]
   b77d7fcf  fcos
   b77d7fd1  fstp qword [esp]
   b77d7fd4  movsd xmm6, [esp]
   b77d7fd9  addsd xmm6, xmm5
   b77d7fdd  addsd xmm7, xmm6
   b77d7fe1  add edi, +0x01
   b77d7fe4  cmp edi, 0x00989680
   b77d7fea  jle 0xb77d7fb0	->LOOP
   b77d7fec  jmp 0xb77d0014	->3
   ---- TRACE 1 stop ->  loop

I suppose the optimizer could recognize the adjacent sin/cos calls in
the IR and merge them to fsincos.  If compiling sincos to SSE2, you
might need a library like http://gruntthepeon.free.fr/ssemath/ .

This all doesn't seem to make a whole lot of difference though.
___libm_sse2_sincos is actually a little slower than the fsincos here
and the speedup is only maybe 30% than with the separate fsin/fcos
instructions, but it depends on your library implementation and its
accuracy level.  It may make a bit more difference in standard Lua,
and the lqd binding has one [3].  Even Lua has the somewhat related
math.atan2, though not for the same reasons.  Here's an example of it
added to lmathlib.c:

   static int math_sincos (lua_State *L) {
     lua_Number x = luaL_checknumber(L, 1);
     lua_pushnumber(L, l_tg(sin)(x));
     lua_pushnumber(L, l_tg(cos)(x));
     return 2;
   }

[1] http://linux.die.net/man/3/sincos
[2] http://stackoverflow.com/questions/2683588/what-is-the-fastest-way-to-compute-sin-and-cos-together
[3] http://lua-users.org/lists/lua-l/2009-04/msg00143.html

References:
- sincos optimization - lua and luajit, David Manura

Prev by Date: Re: [ANN] luaffi (ffi library ala luajit's for the standard lua vm)
Next by Date: Loads of errors compiling/using LuaGnome and LuaGL
Previous by thread: sincos optimization - lua and luajit
Next by thread: Re: sincos optimization - lua and luajit
Index(es):
- Date
- Thread