I'd rather have them exposed as one function sincos() or cossin() returning two numbers.

How would luajit handle the fact if math.cos is something else? I dunno, I'm not Mike, but I have feeling that more "atomic" representations (such as cossin, or sincos might be more doable, and they could be right now). I guess if I want to add this in luajit, I'll just find a function that returns two things and copycat it.

On 7/23/11 7:25 PM, David Manura wrote:

There are a number of occasions (e.g. rotation transformations) where
both the sine and cosine of a number need to be computed together, and
it can be more efficient to do this as a single operation [1,2].  To
take a trivial benchmark,

inline double f(double angle) {
	double s = sin(angle);
	double c = cos(angle);
	return s+c;
int main(void) {
	double i;
	double sum = 0;
	for (i=1; i<1e7; i++) { sum += f(i); }
	printf("%e\n", sum);
	return 0;

gcc4.4.4 (even under "-ffast-math -msse2") compiles that to a fsincos
instruction.  Intel C++2011 compiles it to a ___libm_sse2_sincos call.
  MSVC++2010 compiles it to separate __CIcos/__CIsin calls, fsin/fcos
under -fp:fast, or ___libm_sse2_sin/___libm_sse2_cos under -arch:SSE2.

Here's what we get with LuaJIT2:

   local function f(angle)
     local s = math.sin(angle)
     local c = math.cos(angle)
     return s + c
   local sum = 0
   for i=1,1e7 do sum = sum + f(i) end

   0027 ------ LOOP ------------
   0028    num CONV   0025
   0029    num FPMATH 0028  sin
   0030    num FPMATH 0028  cos
   0031    num ADD    0030  0029
   0032  + num ADD    0031  0024
   0033  + int ADD    0025  +1
   0034>   int LE     0033  +10000000
   0035    int PHI    0025  0033
   0036    num PHI    0024  0032
   ---- TRACE 1 mcode 352

   b77d7fb0  xorps xmm6, xmm6
   b77d7fb3  cvtsi2sd xmm6, edi
   b77d7fb7  movsd [esp+0x8], xmm6
   b77d7fbd  fld qword [esp+0x8]
   b77d7fc1  fsin
   b77d7fc3  fstp qword [esp]
   b77d7fc6  movsd xmm5, [esp]
   b77d7fcb  fld qword [esp+0x8]
   b77d7fcf  fcos
   b77d7fd1  fstp qword [esp]
   b77d7fd4  movsd xmm6, [esp]
   b77d7fd9  addsd xmm6, xmm5
   b77d7fdd  addsd xmm7, xmm6
   b77d7fe1  add edi, +0x01
   b77d7fe4  cmp edi, 0x00989680
   b77d7fea  jle 0xb77d7fb0	->LOOP
   b77d7fec  jmp 0xb77d0014	->3
   ---- TRACE 1 stop ->  loop

I suppose the optimizer could recognize the adjacent sin/cos calls in
the IR and merge them to fsincos.  If compiling sincos to SSE2, you
might need a library like .

This all doesn't seem to make a whole lot of difference though.
___libm_sse2_sincos is actually a little slower than the fsincos here
and the speedup is only maybe 30% than with the separate fsin/fcos
instructions, but it depends on your library implementation and its
accuracy level.  It may make a bit more difference in standard Lua,
and the lqd binding has one [3].  Even Lua has the somewhat related
math.atan2, though not for the same reasons.  Here's an example of it
added to lmathlib.c:

   static int math_sincos (lua_State *L) {
     lua_Number x = luaL_checknumber(L, 1);
     lua_pushnumber(L, l_tg(sin)(x));
     lua_pushnumber(L, l_tg(cos)(x));
     return 2;
