• Subject: integer implementation of mod
• From: Rici Lake <lua@...>
• Date: Thu, 19 May 2005 11:01:32 -0500

The following code is hereby released into the public domain (although I'm never adverse to getting credit):
```
static inline lua_Number imod_aux(lua_Number n, lua_UNumber absm) {
lua_Number res;
if ((absm & (absm ^ mask)) == absm) {
if (absm == 0) overflow_error();
}
else if (n < 0)
res = mask - (lua_UNumber)(~n) % absm;
else
res = (lua_UNumber)n % absm;
return res;
}

lua_Number imod(lua_Number n, lua_Number m) {
if (m < 0) {
lua_Number r = imod_aux(n, (lua_UNumber)(-m));
return r ? m + r : r;
}
else
return imod_aux(n, (lua_UNumber)m);
}

------------

```
This has the same behaviour with respect to signs as the work6 implementation of %; specifically, p % q is either in the range [q+1, 0] or [0, q-1] depending on whether q is negative or positive. I'm not standing up for this particular definition, by the way, just trying to be consistent.
```
```
The code has been carefully optimised for the case of p % 2^k, but a check against an implementation based on the Python implementation (which uses / and *) reveals that it is still faster on both x86 and ppc32 architectures, significantly faster in the former case.
```
Some explanation is possibly in order:

```
m xor m-1 effectively turns all trailing 0's in the binary representation of m into 1's, and turns all bits to the left of the rightmost 1 into 0's. Consequently, m bitand (m xor m-1) is exactly equal to m iff m is a power of 2. If m is a power of 2, then n mod m can be computed by bitand'ing n with m-1.
```
```
The ~n in line 10 is simply based on: ~n === -(n + 1). The version of gcc I used didn't find this optimisation, for some reason. The point of line 10 was to avoid an extra test for a zero result; I couldn't come up with a workable formulation of the same optimisation in line 18, and the result is evident from timing tests.
```
```
I'm not sure that the use of unsigned % is actually necessary; in fact, both arguments to the % operator in both lines 10 and 12 are guaranteed to be positive. However, I had at the back of my mind the thought that unsigned mod might produce better compiled output on some architectures. That might be completely wrong, though.
```
```