• Subject: Some statistical data on double rounding
• From: KHMan <keinhong@...>
• Date: Fri, 8 Jun 2018 17:47:40 +0800

```Hi all,

```
(I will avoid opinions in the following. Y'all are free to decide whatever y'all wish to do. :-p)
```
```
Tests were done on generic MinGW binaries of lua-5.4.0-work1. The 32-bit binary has the double-rounding issue due to extended precision that was discussed in the past few days. The 64-bit binary uses the 64-bit FPU datapath and is free of double-rounding.
```
```
Tests were done on the output of math.random(), which is between 0 and 1. Any zeros are skipped. The first tests are A+B, A-B, A*B and A/B. A and B do not change for the 4 ops. The number of different results between the 32-bit binary and the 64-bit binary is counted.
```
Operation / Op_Count / Diff_Count
=================================
A+B  1,000,000,000  0
A-B  1,000,000,000  0
A*B  1,000,000,000  243,584
A/B  1,000,000,000  244,107

For A*B and A/B, this is roughly 1 in 4000.

```
All different results differ by 1 ULP. Nothing more than 1 ULP was seen. Here is a look at one result:
```
Operation : *
n1  = 0.18987016622488206	0x1.84daa65360288p-3
n2  = 0.98339548253950804	0x1.f77f9cd91528bp-1
32b = 0.18671746373457448	0x1.7e65b9c2a810ep-3
64b = 0.18671746373457451	0x1.7e65b9c2a810fp-3

```
Plugging the values into an online calculator that has binary128 [1], we can do a check with a 'better' result, assuming n1 and n2 is precise to many more digits. The comparison is as follows:
```
fpu8087   = 0.18671746373457448
binary128 = 0.1867174637345744950213132394217624
fpu64bit  = 0.18671746373457451

```
So results that differ have intermediate results that are almost exactly in-between, this is why there is a double-rounding issue. A casual check with a few other numbers show the same trend.
```
```
However, note that those decimal representations of the 64-bit floats are not exactly equal to the exact value of the binary representation -- they just have enough digits for round-tripping. Here are the hex representations of the significands:
```
fpu8087   = 7e65b9c2a810e
binary128 = 7E65B9C2A810E5EAB7A33195161D
fpu64bit  = 7e65b9c2a810f

```
Using this comparison, it appears that the binary128 result is nearer to fpu8087 than to fpu64bit. We can also verify this by getting a more accurate decimal version of the 64-bit binary representation (done by manual fiddling, so not definitive):
```
fpu8087   = 0.1867174637345744847571893387794262
binary128 = 0.1867174637345744950213132394217624
fpu64bit  = 0.1867174637345745125127649544083397

```
Thus it is confirmed that the fpu8087 result is closer to the binary128 result, at least for this example.
```
This ends the first part.

```
In order to generate hits for A+B and A-B, the magnitude of B is changed. B is divided by 1e3, 1e6, 1e9 or 1e12, and then the addition is performed and the comparison made.
```
Operation / Op_Count / Diff_Count
=================================
A+(B/1e3)   1,000,000,000  0
A+(B/1e6)   1,000,000,000  0
A+(B/1e9)   1,000,000,000  244,158
A+(B/1e12)  1,000,000,000  243,489

```
Now we can see the double-rounding differences. For the last two operations, the rate is also roughly 1 in 4000. Again, all different results differ by 1 ULP. Nothing more than 1 ULP was seen.
```
This ends the second part.

[1] http://weitz.de/ieee/

--
Cheers,
Kein-Hong Man (esq.)
Selangor, Malaysia

```