Re: bit.lshift and performance - luabitop v.s. lua-5.2.0-work4

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: bit.lshift and performance - luabitop v.s. lua-5.2.0-work4
From: Mike Pall <mikelu-1010@...>
Date: Wed, 13 Oct 2010 11:42:21 +0200

David Manura wrote:
> As LuaBitOp v.s. Lua-5.2.0-work4 bit libraries has been a point of
> contention, I thought I'd post a little anecdotal feedback tonight.
> LuaBitOp and Lua-5.2.0-work4 bitwise lshift behave differently here:
> 
>  bit.lshift(0xffffffff, 32) --> -1 (LuaJIT2) or 0 (lua-5.2.0-work4)
> 
> Indeed, the LuaBitOp docs say that only the lower 5 bits in the shift
> count are used (i.e. 0-31), and that follows from C and x86 [4].

This doesn't follow from C -- it's undefined behavior in C. Some
CPUs mask by the operand width (e.g. x86), others turn overflow
into zero or sign-extension, and some have an ugly mix. You really
don't want to inflict this mess onto users of a high-level language.

[It gets really ugly with ARM: the constant shifts allow left
shifts by 1-31, whereas the right shifts allow 1-32. But the
variable shifts ignore the upper 24 bits in the shift count _and_
overflow is turned into zero or sign-extension. Yuck!]

So you basically have two design choices for Lua bit shifts:
implement overflow semantics or mask the shift count.

>From a practical viewpoint, overflow semantics depend on having a
non-overflowing representation of the shift count, too. Since Lua
can be configured for either double or int32_t or int64_t numbers,
the shift count itself has inconsistent overflow behavior. Adding
overflow semantics needs extra conditionals on _all_ platforms, too.
Also, it's inconsistent with rotates and has fewer use cases.

OTOH masking the shift count is easy to explain, it's fully
portable to all platforms, it's consistent with rotates, it's
trivial to implement (and even free on some platforms), it has
useful mathematical properties and IMHO it has many more use cases.

So masking the shift count by the operand width gets us reasonable
and efficient cross-platform behavior for Lua bit operations. Oh,
and quite a few of the programs out there using Lua BitOp make use
of these semantics, since it's both useful and defined behavior.

Lua 5.2 would break many of them. Even worse, users cannot load
Lua BitOp anymore, since the incompatible, internal Lua 5.2
library takes precedence!

There are many more incompatibilities, e.g. the signedness of the
results depends on the Lua number type. Passing inputs for bit
operations from C to Lua with lua_pushinteger won't work portably,
too. Code that happens to work on your desktop will break in
unpredictable ways on embedded platforms with Lua 5.2.

Also, it doesn't support modular arithmetic for bit operations,
which is a major use case in crypto stuff. E.g. the following
expression from MD5 relies on additions with wrapping semantics:

  y = rol(bxor(c, bor(b, bnot(d))) + a + x, s) + b

That happens to work on x86 with Lua 5.2 by accident, but breaks
badly e.g. on ARM. In more complicated code you'll only notice
this when you hit just the right input operands. Good luck ...

The Lua 5.2 bit library is incompatible with Lua BitOp, it's
semantics are not well-defined and it's not cross-platform-safe.
It should really be renamed or dropped.

> $ lua52-work4 t.lua
> 255     255     8
> 255     65535   16
> 255     16777215        24
> 255     4294967295      32
> 4294967295      0       0
> 
> $ luajit2-20101011 t.lua
> 255     255     8
> 255     65535   16
> 255     16777215        24
> 255     -1      32
> 0       -1      0
> -----
> 
> The two bit libraries (and the pure Lua alternative) give the
> identical final result for nbits < 32, but for nbits == 32 the
> LuaBitOp breaks unless handled as a special case.  That's not to say
> that it is bad.  You may want Lua to have the same bitwise semantics
> as C to simplify conversion of C/native code to Lua (and vice
> versa--e.g. LuaToCee).

Actually your code will break with Lua 5.2 on non-x86 platforms,
too. Because with nbits = 32, bit.lshift(1, nbits)-1 evaluates to
0-1 = -1, which is not a valid input to bit.band() for the Lua 5.2
bit library. The result is unpredictable -- it just happens to
work on x86 by accident. So your code needs to be rewritten,
anyway (see the reply by KHMan).

E.g. here's the output for Lua 5.2 on PPC/e500v2:

255     255     8
255     65535   16
255     16777215        24
255     4294967295      32
0       0       0

Oops?!

Developers just love unpredictable behavior, right? Yet another
example, which shows why I'm insisting on portable and consistent
semantics. Lua BitOp returns consistent results on all platforms.

> Another area of difficulty involves a function that reverses the order
> of bits [5].  There is not an easy way to optimize the pure Lua form
> with existing native bitwise operations:

That's troublesome in C, too. Only a few CPUs have special
instructions for that (e.g. rbit on ARMv6T2+). See this resource
for the common workarounds in C:
  http://graphics.stanford.edu/~seander/bithacks.html#BitReverseObvious

> However, most of the other bit operations in
> ModuleCompressDeflateLua are shifts and modulus, which, although
> cleaner with their native operations, tend to be slightly slower when
> implemented as function calls (except maybe LuaJIT2).
> 
>  local x = 0
>  local rshift = bit.rshift
>  for i=1,1000000000 do
>    x = (x - x % 256) /  256
>    --x = rshift(x, 8)
>  end

It's not too surprising that the version with rshift is much
faster with LuaJIT. It can be turned into a single machine code
instruction inside the loop:

->LOOP:
394cffe0  shr ebp, 0x08      <========
394cffe3  add ebx, +0x01
394cffe6  cmp ebx, 0x3b9aca00
394cffec  jle 0x394cffe0        ->LOOP

That's another benefit of the Lua BitOp semantics: they allow much
better code generation in LuaJIT.

--Mike

Follow-Ups:
- Re: bit.lshift and performance - luabitop v.s. lua-5.2.0-work4, Leo Razoumov

References:
- bit.lshift and performance - luabitop v.s. lua-5.2.0-work4, David Manura

Prev by Date: Re: bit.lshift and performance - luabitop v.s. lua-5.2.0-work4
Next by Date: Re: bit.lshift and performance - luabitop v.s. lua-5.2.0-work4
Previous by thread: Re: bit.lshift and performance - luabitop v.s. lua-5.2.0-work4
Next by thread: Re: bit.lshift and performance - luabitop v.s. lua-5.2.0-work4
Index(es):
- Date
- Thread