lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Alex Davies wrote:
> I've been trying to figure out how jit 2.x manages to reduce numbers in 
> this instance to integers. I can see that the band/bxor could guarantee 
> that the numbers fit in the range [...]

Since Lua doesn't have a standard bit manipulation library (yet),
I took the liberty to define its semantics:

  The base numeric type used by LuaJIT is a double-precision
  floating-point number. But bit operations need to work on
  fixed-precision integral numbers. The following rules are
  intended to give a precise and useful definition (for the
  programmer), yet give the implementation (interpreter and
  compiler) the maximum freedom to apply optimizations:

  All bit operations take one or more numeric arguments and produce
  one numeric result. Coercion from strings to numbers is enabled
  by default, but is deprecated.

  Input arguments to bit operations are reduced to a 32 bit integer
  by taking their least-significant 32 bits. Numbers outside of the
  range +-2^51, +-Inf or NaN give undefined results. Non-integral
  numbers are truncated or rounded in an implementation-defined
  way. Undefined or implementation-defined behaviour means that the
  results could differ between versions or even between interpreted
  and compiled code. It's strongly advised to avoid these cases.

  The result of a bit operation is always a signed 32 bit integral
  number.

  This way it's possible to pass both signed and unsigned 32 bit
  numbers (but the result is always signed) and get wrap-around
  semantics for integer arithmetics. Examples:

    bit.tobit(1234)                    --> 1234
    bit.tobit(-1234)                   --> -1234
    bit.tobit(0xffffffff)              --> -1
    bit.tobit(0x87654321)              --> -2023406815
    bit.tobit(2^40+1234)               --> 1234
    bit.tobit(2^40-1234)               --> -1234
    bit.tobit(0x87654321 + 0x87654321) --> 248153666
    bit.band(1234, -1234)              --> 2
    bit.band(2^40+1234, 2^40-1234)     --> 2

  The use of bit.tobit is for illustration only -- it's usually
  redundant, because all operations reduce all of their inputs
  anyway. But you may want to use it for the final reduction steps
  in crypto algorithms which rely heavily on wrap-around semantics
  for integer addition or subtraction.

> but the additions?

The trace recorder first emits FP ADDs for an addition of course.
Input operands are converted to doubles first with TONUM. This is
to preserve the full FP precision and range.

Bit operations always "integerize" their operands. This can be as
simple as skipping a TONUM or inserting a TOBIT conversion. But it
also recognizes FP ADD/SUB and emits an INT ADD/SUB instead. This
effectively backpropagates and/or eliminates the (semi-expensive)
TOBIT as far as possible.

Together with CSE you get pure integer code for bit-intensive
workloads like crypto algorithms. For code like bxor(a+b, c-d) the
FP ADD/SUB are left as dead and are never encoded.

> The table access?

Well, the indexes here are constant integers. In the general case a
similar backpropagation algorithm is used. E.g. the FP ADD from a
t[i+100] is replaced with an INT ADDOV (checking for overflow).
This is only useful if 'i' is already an integer. That's why the
trace recorder does some ad-hoc range analysis to detect integer
induction variables (and avoid overflow checks for the increments).

The result of a table access has to be type-checked and converted
to an integer of course (except if store-to-load forwarding works).
I could unroll the loop to unpack the string into 32 bit chunks to
avoid that. But that awaits the planned integration of a
struct/lpack-like library (not right now).

You don't see an instance of the TOBIT conversion in the machine
code excerpt because it's done on the first load and all further
conversions are CSEd. You only get to see some loads from stack
slots, but these are just restores for spilled operands.

--Mike