lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


D Burgess wrote:
> Seems that  this is the 14/15 digit problem with doubles. How does
> fmod do it? At 2^54 we are beyond double capacity to show all digits.
> So how does it know what the remainder is? Some form of
> optimized arithmetic?

By using an iterative subtract-and-shift division algorithm.

A good example is the x87 FPU instruction 'fprem': it calculates
64 bits(*) in one pass. A flag is set if this was not sufficient.
You're supposed to run it in a loop, like this:

|  fprem      // [x y] -> [rem y] or [partial-rem y]+C2 flag set
|  fnstsw ax
|  sahf
|  jp <1      // Retry if C2 flag is set.

(*) The x87 FPU registers have a higher internal precision.

The situation is worse for all the platforms that don't have good
hardware support for it. They are probably using a variant of:
... which is dead slow unless rewritten in native assembler.
Alas, glibc and the BSD libc's only have native versions for
x86/x87, x64 and ia64. Seems floating point mod is not very
popular (unlike integer mod).

My guess: This was the reason why the Lua authors decided to use
x-floor(x/y)*y. It's a lot faster on most platforms and more or
less the same for x86/x87 (since fmod is usually not inlined).

But it's very imprecise if the exponents of the two arguments
differ by more than the precision of the mantissa. I still think
it's 'good enough' for most use cases (args in the int32 range).
Anyone who really needs the precision should use math.fmod (it's
not going away -- but the compatibility math.mod alias is).

BTW: Rici mentioned a copysign() workaround. But IMHO this
doesn't work?

  print(2%-3, -2%3, math.mod(2,-3), math.mod(-2,3))
  -->   -1      1             2               -2

It would also be pretty slow because copysign modifies the
highest bit of a 64 bit double with 32 bit integer ops (at least
on x86/x87). This incurs storing and retrieving it from memory
(no direct access to FPU regs from the integer path) and several
store-to-load forwarding stalls.