lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


In number theory and algebra, the notation and concepts are sometimes developed as follows:

The first concept is divisibility. a | b is a proposition, representing that the integer a divides the integer b. In C you would usually write this test as (a % b == 0), but for purposes of math, you'd define an study | before talking about modulo probably. For instance, you can see that | defines a partial order on the integers. So, if a | b and b | c, then a | c. Also a | a, and if both a | b and b | a, then a = b.

The second concept is congruence modulo a number. In math it's usually notated:

a = b mod c

except instead of using =, you would use 3 lines, for instance, unicode 2261.

Here "mod" is not a binary operator being applied to b. It's rather a qualifier which modifies the meaning of =, you should [parse this basically as a special ternary operator. However, usually c is not changing. It might have been better if we notated thins like this

a =_c b

where c is a subscript modfying the equals symbol.

The meaning of this notation is, c | (a - b)
that is, that c divides the difference of a and b.

For any number a, the "congruence class " of a modulo c is the set of all b which is it is congruent to. Congruence is an equivalence relation: It's always the case that a = a mod c, because 0 is divisible by every number. Also, if a = b mod c then b = a mod c, and if a = b mod d and b = c mod d, then it's easy to see a = c mod d.

So, for any fixed number c, the set of all integers is partitioned into a collection of disjoint sets, the congruence classes modulo c. There are exactly c such classes, corresponding to the integers between 0 and c. Every other number is congruent to one of those -- you can figure out which one by taking the remainder after dividing by c, which is the conventional definition of "modulo".

One of the basic ideas of algebra is that just as we can add and multiply two numbers, it turns out that we can add and multiply two congruence classes, and the result will be unambiguously a congruence class. This is a bit counterintuitive at first. To do it, we simply take any element of the first congruence class, and any element of the second congruence class, add or multiply them as required, then compute the congruence class of the result. Naively, different choices could lead to different outcomes, so this wouldn't be well-defined, but it turns out that it always produces the same answer.

Let's introduce some new notation: Say that a mod c is a binary operation which takes two integers, and produces a set of integers, the congruence class of a mod c. This map is well-studied in group theory -- it is the canonical map from the integers Z to the quotient group Z / cZ. (The quotient group Z / cZ is the group whose elements are the congruence classes of Z mod c.)

If we do it that way, then we can reinterpret the standard notation "a = b mod c" as a succinct way of writing "a mod c = b mod c", that is, a and b are congruent if they correspond to the same congruence class.

Then, the most important properties of congruence and modulo can be written as follows:

(a + b) mod c = (a mod c) + (b mod c)
(a * b) mod c = (a mod c) * (b mod c)

These identities are extremely important, they enable computational shortcuts when you only care about the answer modulo some number. For instance, suppose I have some polynomial x^2 + 100x - 57, and I want to evaluate it mod 5 at some value x = 1701. Because of these identities, I can always reduce the input and any intermediate result mod 5 eagerly throughout the computation, without changing the final result. So

(1701)^2 + 100*1701 - 57 mod 5

= ((1701 mod 5)^2 mod 5) + (100 * 1701 mod 5) - 57 mod 5
= 1 + 0 - 2 mod 5 = - 1 mod 5 = 4 mod 5

This basic idea underlies important algorithms like fast exponentiation by repeated squaring. It's used essentially in Diffie-Hellman key exchange for instance, and it has numerous other applications in computer science.

In group theory and algebra, the quotient group construction is greatly generalized, and applies to lots of other situations. "Properly", algebraists think of "modulo" as yielding a congruence class, not a particular number.

In some arguments it is necessary to find "congruence class representatives", so you might say, the numbers 0, ..., c-1 are a set of representatives for the congruence classes mod c, and you might want a map that maps a number to its unique congruence class representative.

The modulo operation which appears in programming languages is an example of such a map, it's certianly useful. It produces for each number the remainder after division, which is in particular a "simpler" member of the congruence class.

But note that, unfortunately, the way that it is implemented in C gives unfortunate semantics.

For instance, it's easy to see that 1 is congruent to -1 modulo 2, because they differ by 2.

But if I write it in C, as `1 % 2 == -1 % 2`, I might get false instead of true, because the result for the first may be positive while the result for the second may be negative.

That's stupid and serves no useful purpose. Modulo would be far more natural and useful if (a % c == b % c) meant the same thing as "a is congruent to b mod c". Instead, I have to write ((a - b % c) == 0). Modulo should return the same value for any two numbers which are congruent, computing a *unique* set of congruence representatives. When it can return any value between -c and c that's just an unfortunate design flaw.

I believe that this behavior creates subtle bugs and problems in real programs, and might prevent the compiler from making some natural transformations that could be useful to make. For instance, it would be great if ((x * y + z * w) %) could be analyzed and rewritten based on the standard modulo identities, but usually, the possibility of negative values in modulo means that this is not exactly the same as ((x * y) % p + (z * w) % p) % p, so even if the compiler could generate better code for one form, or exploit knowedge that z or w = 0 mod p for instance, to simplify the _expression_, it will usually be afraid to do so.

So, in summary, it would be far far better if (a % b) is always at least 0 and less than b, and undefined / an error for b = 0, or negative b. It's not an "advantage" or "more natural" to return negative values when a is negative. This is because in the modular ring "Z / bZ", there really aren't any negative numbers. Every negative number is congruent to a positive number, and you can't separate them usefully into two distinct groups like you can with the integers and the rationals and so on. There are no "inequalities" in modular arithmetic, nor any negative numbers, at least not with any useful or interesting algebraic structure. It would be cleaner, simpler, more elegant, and probably more useful and efficient if modulo never returned negative numbers.

Best,
Chris

On Thu, Feb 9, 2017 at 6:18 AM, Martin <eden_martin_fuhrspam@gmx.de> wrote:
On 02/08/2017 02:40 PM, chris beck wrote:
>> Hmmm... So why does my programmers calculator, Go, _javascript_, etc,
> etc all return -3 in this case?
>>
>> Personally I think it's a bug, as "2" is not the correct result for
> that _expression_ (in any language but Lua it seems).
>
> For what it's worth, among anyone who was a math major or took a course
> in algebra in college, the congruence class of -2 mod 5 is 3
[snip]

It depends on additional restrictions of mod() results.

Gerenally

mod(n, m) is some function like
  : get x, y such that
  :   x * m + y == n,
  :   int(x), int(y), abs(y) < abs(m)
  : return y

And further restrictions possible:
  : x = min(possible x) // x leftmost from zero
  => mod(-2, -5):  0 * -5 + -2 => -2         (1)
  => mod( 2, -5): -1 * -5 + -3 => -3
  => mod(-2,  5): -1 *  5 +  3 =>  3
  => mod( 2,  5):  0 *  5 +  2 =>  2

  : x = max(possible x) // x rightmost from zero
  => mod(-2, -5):  1 * -5 +  3 =>  3         (2)
  => mod( 2, -5):  0 * -5 +  2 =>  2
  => mod(-2,  5):  0 *  5 + -2 => -2
  => mod( 2,  5):  1 *  5 + -3 => -3

  : y = min(possible y) // y leftmost from zero
  => mod(-2, -5):  0 * -5 + -2 => -2         (3)
  => mod( 2, -5): -1 * -5 + -3 => -3
  => mod(-2,  5):  0 *  5 + -2 => -2
  => mod( 2,  5):  1 *  5 + -3 => -3

  : y = max(possible y) // y rightmost from zero
  => mod(-2, -5):  1 * -5 +  3 =>  3         (4)
  => mod( 2, -5):  0 * -5 +  2 =>  2
  => mod(-2,  5): -1 *  5 +  3 =>  3
  => mod( 2,  5):  0 *  5 +  2 =>  2

And even more:
  : x = min(abs(possible x)) // x towards zero
  => mod(-2, -5):  0 * -5 + -2 => -2         (5)
  => mod( 2, -5):  0 * -5 +  2 =>  2
  => mod(-2,  5):  0 *  5 + -2 => -2
  => mod( 2,  5):  0 *  5 +  2 =>  2

  : x = max(abs(possible x)) // x away zero
  => mod(-2, -5):  1 * -5 +  3 =>  3         (6)
  => mod( 2, -5): -1 * -5 + -3 => -3
  => mod(-2,  5): -1 *  5 +  3 =>  3
  => mod( 2,  5):  1 *  5 + -3 => -3

  : y = min(abs(possible y)) // y towards zero
  => mod(-2, -5):  0 * -5 + -2 => -2         (7)
  => mod( 2, -5):  0 * -5 +  2 =>  2
  => mod(-2,  5):  0 *  5 + -2 => -2
  => mod( 2,  5):  0 *  5 +  2 =>  2

  : y = max(abs(possible y)) // y away zero
  => mod(-2, -5):  1 * -5 +  3 =>  3         (8)
  => mod( 2, -5): -1 * -5 + -3 => -3
  => mod(-2,  5): -1 *  5 +  3 =>  3
  => mod( 2,  5):  1 *  5 + -3 => -3

  /*
    Due some reason results of (5) and (7) are equal.
    Same for results (6) and (8).
  */

So at least six different variants possible. Which one you prefer and why?

-- Martin