Re: Lua 5.1 (beta) now available

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Lua 5.1 (beta) now available
From: Rici Lake <lua@...>
Date: Fri, 18 Nov 2005 18:45:14 -0500


On 18-Nov-05, at 12:18 PM, Mike Pall wrote:

On the other hand, I've certainly seen C compilers (though I
admit not for a long time) which would cheerfully optimize away the
(a)!=(a) check (which certainly should be optimized away if a is an
integer type.)


It's clearly a violation of the standard to optimize this
comparison away for floating point numbers. Abandon all
hope that such a compiler gets the other subtle issues
of FP arithmetic right.

Which standard would that be? :) All I see in the C standard is thatthe value of a comparison or equality operator is "1 if the specifiedrelation is true and 0 if it is false". Even C99 does not mandate theuse of IEEE-754 floating point, and it is not intrinsic to floatingpoint that either (1) there is such a thing as "not a number" or (2) ifthere is, that it tests unequal to itself.

OK, to be fair, C99 does say that if the implementation purports toimplement IEEE-754 floating point (by defining __STDC_IEC_559__), thenit has to make == and != work that way. On the other hand, in thatcase, it also has to define isunordered(x, y). And, curiously, gcc (atleast on x86) *does* inline isunordered(x, x) even though it does notinline isnan(x) (which is semantically identical). Go figure.

(gcc does not seem to define __STDC_IEC_559__, though. Perhaps theimplementation isn't considered complete yet. So it's under noobligation to honour the definition of ==. See below.)


So I timed the following three little snippets in a hard loop:

double uno(double x) {
  if (isunordered(x, x)) return 0;
  else return x + 1;
}

double isn(double x) {
  if (isnan(x)) return 0;
  else return x + 1;
}

double cmp(double x) {
  if (x != x) return 0;
  else return x + 1;
}

using 0.0 and some NaN as an argument. Results: (nanoseconds periteration, timed with 100,000,000 iterations). (Remember when you couldbenchmark without counting zeros? :)


isnan(x)

NaN |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx107.8

  0.0      |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx     101
x != x
  NaN      |xxxxxxxxxx     19.6
  0.0      |xxxxxxxxxxxx   24.5
isunordered(x, x)
  NaN      |xxxxxxxxxxxx   24.2
  0.0      |xxxxxxxxxx     19.0

Conclusion: on gcc/x86 (and possibly other platforms), the best test is'isunordered(x,x)'; it's apparently faster in the common case, and itis semantically correct.

Now, gcc offers the interesting optimization flag -ffast-math. Youshould never use this flag. We all know that, right? But I supposepeople do. Anyway, I tried it. Two interesting things arise:

First, with -ffast-math, gcc optimizes away the 'x != x' test. Ofcourse, it never claimed to be IEEE-754 compliant, so it's allowed todo that.

Second, with -ffast-math, gcc also optimised away the call toisunordered() (!).

Finally, -ffast-math is anything but fast when presented with NaNs. Iran the same tests as above, but the results won't fit on the width ofthe email:


isnan(x)
  NaN      |-----------------------------------------------------> 4247
  0.0      |xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    104
x != x
  NaN      |---->        3933* (wrong answer)
  0.0      |xxxxxxx        13.2
isunordered(x, x)
  NaN      |---->        3933* (wrong answer)
  0.0      |xxxxxxx        13.1

So optimising away the tests does produce the right answer faster. Butwhy is it so slow to produce the wrong answer? And why does theslowness affect a non-inlined call to isnan() as well?

The answer to the first question is, of course, the pathetic handlingof NaNs by Pentiums. Since the check for NaN has been removed from thecode, the addition NaN+1.0 takes place; this is disastrously slow on aPentium 4.

Examination of the assembly shows that this is very similar to theisnan(x) case. In a vain attempt to squeeze every microcycle out of thePentium 4, gcc moves the addition to *prior* to the test of the resultof isnan(x). That is, it does the addition regardless of whether itneeds the value, because "it can't hurt". One presumes that it wouldn'thave done that had it been a division instead of an addition, and itcertainly does not do it without -ffast-math, presumably because inthat case it knows that the addition might change an fp exception flag.But the joke's on gcc in the end; the "unnecessary but harmless"addition ends up costing a 40x slowdown.

Follow-Ups:
- Re: Lua 5.1 (beta) now available, Roberto Ierusalimschy
- Re: Lua 5.1 (beta) now available, Dave Dodge

References:
- Lua 5.1 (beta) now available, Luiz Henrique de Figueiredo
- Re: Lua 5.1 (beta) now available, Mike Pall
- Re: Lua 5.1 (beta) now available, Rici Lake
- Re: Lua 5.1 (beta) now available, Mike Pall

Prev by Date: Re: Lua 5.1 (beta) now available
Next by Date: Lua 5.1 packaging -- how to participate?
Previous by thread: Re: Lua 5.1 (beta) now available
Next by thread: Re: Lua 5.1 (beta) now available
Index(es):
- Date
- Thread