[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Lua 5.1 (alpha) now available
- From: Mike Pall <mikelu-0509@...>
- Date: Sat, 3 Sep 2005 14:15:55 +0200
Hi,
David Burgess wrote:
> Beats me. How  does the following from luaconf.h work? 
> 
> union luai_Cast { double l_d; long l_l; };
> #define lua_number2int(i,d) \
>   { volatile union luai_Cast u; u.l_d = (d) + 6755399441055744.0; (i) = u.l_l; }
Well known trick. Add 2^52+2^51 to a number with 53 bits of
precision and the integers will be in the least significant
bits of the mantissa. This happens to be aligned to a 32 bit
integer starting at the same memory offset (with IEEE 754 doubles
and on a little-endian machine only).
Try this Lua program and you will be enlightened:
  local magic = 2^52+2^51
  for x=0,2,0.125 do print(x, x+magic-magic) end
[This assumes that temporary results from FP arithmetics are
stored in a 64 bit IEEE 754 double. The assumption is true for
standard Lua on most machines.]
Unfortunately gcc 3.3.5 produces less than optimal code for the
above idiom in the two places in ltable.c where it matters most.
Still, the performance is similar or better than the assembler
variants we had previously (because gcc screws up the surrounding
code for this one even more).
The 'magic' code is of course a lot more portable across x86
compilers than the assembler variants.
Canonical benchmark for lua_number2int():
  local t={1}; local x=1; for i=1,1e7 do local y = t[x] end
This is 25% slower without special casing the code for x86 boxes
(at least on a PIII). It's less pronounced on newer CPU designs
(e.g. Xeon or Opteron), because they have extra logic to avoid
a flush of the FP pipeline if only the rounding mode of the control
word is changed.
Note that all of the above is irrelevant for any non-x86 CPU
or when SSE2 is available. Because these _do_ supply a fast
'truncate FP to integer' operation.
Bye,
     Mike