# Floating Point  wiki

## Intro

Now and then, someone on the Lua mailing list asks roughly, "Is it OK to use floating point arithmetic for integer applications?" This discussion intends to allay such fears, without misrepresenting the problems that do exist.

The main problem in floating point numbers being the only numeric type in Lua is that most programmers do not understand floating point. Floating point numbers are A LOT more complicated than integers (fixed point numbers). Most programmers' mental model of floating point is that the result of an operation sometimes has a very slight error. As a consequence, people get jittery. But this mental model is wrong; modeling floating point arithmetic as "correct answer + noise" is wrong, particularly when dealing with the most common type of floating point, IEEE-754.

The solution to this whole problem is education. The essential integer-backwards-compatible behavior of double floating point should be emphasized more in the documentation, the website, the FAQ, the wiki, etc.

Before going further, it should be noted that although floating point numbers are often used as the numeric type in Lua, Lua does not depend on floating point but can be compiled with any numeric type, such as single precision floating point, integer, or even some very unfamiliar numerical representations[*5]. Of course, the semantics of division, for example, changes significantly if you compile Lua with an integer numeric type, but on an embedded processor that would probably not come as a surprise.[*1] ARM, Coldfire, and various flavors of embeded MIPS have no FPU, yet these are _very_ widely used.[*2]

## Accuracy

If you have a normal, modern (2000 AD) desktop computer, then you have IEEE 754 double precision floating point.[*4]

Here are some points about (IEEE 754) 64-bit double-precision floating point numbers ("doubles") v.s. 32-bit integers:

• Double can represent many more integers exactly.
• In fact, everything integer can represent, double can represent exactly.
• Every result that integer arithmetic can compute correctly and exactly, double arithmetic can compute correctly and exactly.
• Largest power of ten: a 64-bit double can represent all integers exactly, up to about 1,000,000,000,000,000 (actually - 2^52 ... 2^52 - 1). [*3]
• Largest power of ten: a 32-bit int can represent all integers exactly, up to about +/-2,000,000,000 (actually - 2^31 ... 2^31 - 1).

Summary: 64bit precision floating point arithmetic is better than 32bit integer arithmetic, from the point of view of accuracy, range, and 'backwards-compatibility'. See the caveats section though.

### Arithmetic operations

IEEE 754 defines the basic mathematical operations, + - * (multiplication) / (division), to produce results that are as close as possible to the exact mathematical answer. In particular, if the answer is exactly representable as a double-precision floating point then that will be the answer; otherwise, there are two double-precision floating point numbers that are nearest the exact answer (one just above and one just below), and one of those two is selected according to the rounding mode.

For operations involving integers, most of the operations are exact. If I compute 2+5, then because the exact mathematical answer is 7, that is the also IEEE 754 answer.

## Performance

### Big CPUs

Regarding performance, most serious modern desktop CPUs these days can process double floating point as fast as or faster than integer. This includes modern MIPS R5000 and modern PPC 700 and better. Common FPU operations are one clock throughput. Add subtract compare multiply. (Better, a multiply-add instruction may well achieve one clock throughput in FPU only, making it faster than integer multiply-add in the ALU. Multiscalar architectures and SIMD can improve floating point even further. This is probably not relevant to Lua though.) Often, floating point multiply is faster than integer multiply (because floating point multiply is used more often, the CPU designers spend more effort optimising that path). Floating point divide may well be slow (often more than 10 cycles), but so is integer divide.

Admittedly Intel's Pentium design comes a poor third (because it has too few FP registers). But Intel is catching up.

The only remaining performance concern is floating point to integer conversion. Like it or not, memory load instructions operate with an address and byte offset (i.e. two integer values). Therefore, any performance savings of using floating point instead of integers is for naught if the CPU's float-to-int conversion performance is poor. Some claims state that float to int conversion can have a devastating effect on performance . (For gcc-3.4 and a circa 2006 AMD processor, the problem is gone, but test it for yourself. It is easy to compile the benchmark from that link.)

### Memory

For users of these serious modern desktop CPUs the only major remaining potential issues are memory bandwidth and memory usage. Presumably, because of the cell (union structure) size in Lua for these objects, these considerations are actually irrelevant.

Hard data about this would be good here, such as the `sizeof()` the appropriate union with single and double precision floating point configed under various (or one) architecture(s).

-- In the world of 8-byte structure alignment, they don't really take any more space, though if the object structure ever gets over 64 bits due to the size of a FP number, the size issue is "durn tootin" relevant. FP numbers also take more room in CPU caches -- it's not just registers that suffer. -- ChuckAdams

### Small CPUs

There are users of older or smaller or embedded CPUs who have no floating point, no 64-bit floating point, CPU emulated floating point, or slow hardware floating point. These include 80386, 68040, ARM, etc. Hopefully the luaconfig.h feature fills their needs.

Summary: Circa 2001 your average entry level \$10 CPU chip is capable of computing with 64b double as fast and as accurately as integer arithmetic (or more so). Users of these CPUs should not pay any costs by virtue of using double floating point only in lua, and may reap benefits. Users of smaller chips may well be obliged to use integer arithmetic only in lua.

-- as also noted below, most ARM chips, one of the most widely used embedded CPUs, have no hardware floating point support. This means every conversion between int and float, and every arithmetic operation, is a function call (usually inlined). Doubles are even worse. When you consider the checks for NaN and denormalized numbers, the overhead vs. integers is quite significant (and yes, I have profiled). --BenStJohn?

While Lua's number type can be built as integer, profiling is suggested before use since (even on integer-only processors) the speed up might be not as great as imagined and the loss of floating point can make some operations more complex. Also note Lua 5.3 has explicit integer support built in.

eLua http://www.eluaproject.net/ is a fork of Lua for some microcontrollers which has an optional integer build. However, it should be noted that standard Lua also builds on cores like the ARM7TDMI quite easily - since the source is standard C. The integer changes can be applied if necessary.

## Caveats

Your mileage may vary. Example causes:
• If your C code passes huge numbers of integers to Lua, that may be slow on your architecture. Or it may create predictability problems on your architecture.
• You might really need integer arithmetic divide, for example. A few (very few) algorithms really do. Additionally you might find that the proper floating point truncate is slower than integer, and that this algorithm is time critical for your application.
• Some vendors' `printf` implementations may not be able to handle printing floating point numbers accurately. Believe it or not, some may incorrectly print integers (that are floating point numbers). This can manifest itself as incorrectly printing some numbers in Lua. QNX has this problem . The only sensible fix is to complain to your vendor and/or replace `printf` with one that works.
• If what you really need is 64-bit signed or unsigned integers, then a 64-bit floating point (ie, double) may be an issue. For example, if you're interfacing with the C-API and C-code actually needs a 64-bit signed/unsigned int (ie, long long) such as for a random number or key or serial number or to go to/from a network packet. To do that you're better off having the C-code use a Lua userdata to hold the value, and provide methods/metamethods for manipulating the 64-bit value.

## References

These are not in any particularly relevant order:

-- MartinHollis (original author)