[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
**Subject**: **integer implementation of mod**
**From**: Rici Lake <lua@...>
**Date**: Thu, 19 May 2005 11:01:32 -0500

`The following code is hereby released into the public domain (although
``I'm never adverse to getting credit):
`
static inline lua_Number imod_aux(lua_Number n, lua_UNumber absm) {
lua_UNumber mask;
lua_Number res;
mask = absm - 1;
if ((absm & (absm ^ mask)) == absm) {
res = n & mask;
if (absm == 0) overflow_error();
}
else if (n < 0)
res = mask - (lua_UNumber)(~n) % absm;
else
res = (lua_UNumber)n % absm;
return res;
}
lua_Number imod(lua_Number n, lua_Number m) {
if (m < 0) {
lua_Number r = imod_aux(n, (lua_UNumber)(-m));
return r ? m + r : r;
}
else
return imod_aux(n, (lua_UNumber)m);
}
------------

`This has the same behaviour with respect to signs as the work6
``implementation of %; specifically, p % q is either in the range [q+1,
``0] or [0, q-1] depending on whether q is negative or positive. I'm not
``standing up for this particular definition, by the way, just trying to
``be consistent.
`

`The code has been carefully optimised for the case of p % 2^k, but a
``check against an implementation based on the Python implementation
``(which uses / and *) reveals that it is still faster on both x86 and
``ppc32 architectures, significantly faster in the former case.
`
Some explanation is possibly in order:

`m xor m-1 effectively turns all trailing 0's in the binary
``representation of m into 1's, and turns all bits to the left of the
``rightmost 1 into 0's. Consequently, m bitand (m xor m-1) is exactly
``equal to m iff m is a power of 2. If m is a power of 2, then n mod m
``can be computed by bitand'ing n with m-1.
`

`The ~n in line 10 is simply based on: ~n === -(n + 1). The version of
``gcc I used didn't find this optimisation, for some reason. The point of
``line 10 was to avoid an extra test for a zero result; I couldn't come
``up with a workable formulation of the same optimisation in line 18, and
``the result is evident from timing tests.
`

`I'm not sure that the use of unsigned % is actually necessary; in fact,
``both arguments to the % operator in both lines 10 and 12 are guaranteed
``to be positive. However, I had at the back of my mind the thought that
``unsigned mod might produce better compiled output on some
``architectures. That might be completely wrong, though.
``
`