[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: bit.lshift and performance - luabitop v.s. lua-5.2.0-work4
- From: Mike Pall <mikelu-1010@...>
- Date: Wed, 13 Oct 2010 11:42:21 +0200
David Manura wrote:
> As LuaBitOp v.s. Lua-5.2.0-work4 bit libraries has been a point of
> contention, I thought I'd post a little anecdotal feedback tonight.
> LuaBitOp and Lua-5.2.0-work4 bitwise lshift behave differently here:
>
> bit.lshift(0xffffffff, 32) --> -1 (LuaJIT2) or 0 (lua-5.2.0-work4)
>
> Indeed, the LuaBitOp docs say that only the lower 5 bits in the shift
> count are used (i.e. 0-31), and that follows from C and x86 [4].
This doesn't follow from C -- it's undefined behavior in C. Some
CPUs mask by the operand width (e.g. x86), others turn overflow
into zero or sign-extension, and some have an ugly mix. You really
don't want to inflict this mess onto users of a high-level language.
[It gets really ugly with ARM: the constant shifts allow left
shifts by 1-31, whereas the right shifts allow 1-32. But the
variable shifts ignore the upper 24 bits in the shift count _and_
overflow is turned into zero or sign-extension. Yuck!]
So you basically have two design choices for Lua bit shifts:
implement overflow semantics or mask the shift count.
>From a practical viewpoint, overflow semantics depend on having a
non-overflowing representation of the shift count, too. Since Lua
can be configured for either double or int32_t or int64_t numbers,
the shift count itself has inconsistent overflow behavior. Adding
overflow semantics needs extra conditionals on _all_ platforms, too.
Also, it's inconsistent with rotates and has fewer use cases.
OTOH masking the shift count is easy to explain, it's fully
portable to all platforms, it's consistent with rotates, it's
trivial to implement (and even free on some platforms), it has
useful mathematical properties and IMHO it has many more use cases.
So masking the shift count by the operand width gets us reasonable
and efficient cross-platform behavior for Lua bit operations. Oh,
and quite a few of the programs out there using Lua BitOp make use
of these semantics, since it's both useful and defined behavior.
Lua 5.2 would break many of them. Even worse, users cannot load
Lua BitOp anymore, since the incompatible, internal Lua 5.2
library takes precedence!
There are many more incompatibilities, e.g. the signedness of the
results depends on the Lua number type. Passing inputs for bit
operations from C to Lua with lua_pushinteger won't work portably,
too. Code that happens to work on your desktop will break in
unpredictable ways on embedded platforms with Lua 5.2.
Also, it doesn't support modular arithmetic for bit operations,
which is a major use case in crypto stuff. E.g. the following
expression from MD5 relies on additions with wrapping semantics:
y = rol(bxor(c, bor(b, bnot(d))) + a + x, s) + b
That happens to work on x86 with Lua 5.2 by accident, but breaks
badly e.g. on ARM. In more complicated code you'll only notice
this when you hit just the right input operands. Good luck ...
The Lua 5.2 bit library is incompatible with Lua BitOp, it's
semantics are not well-defined and it's not cross-platform-safe.
It should really be renamed or dropped.
> $ lua52-work4 t.lua
> 255 255 8
> 255 65535 16
> 255 16777215 24
> 255 4294967295 32
> 4294967295 0 0
>
> $ luajit2-20101011 t.lua
> 255 255 8
> 255 65535 16
> 255 16777215 24
> 255 -1 32
> 0 -1 0
> -----
>
> The two bit libraries (and the pure Lua alternative) give the
> identical final result for nbits < 32, but for nbits == 32 the
> LuaBitOp breaks unless handled as a special case. That's not to say
> that it is bad. You may want Lua to have the same bitwise semantics
> as C to simplify conversion of C/native code to Lua (and vice
> versa--e.g. LuaToCee).
Actually your code will break with Lua 5.2 on non-x86 platforms,
too. Because with nbits = 32, bit.lshift(1, nbits)-1 evaluates to
0-1 = -1, which is not a valid input to bit.band() for the Lua 5.2
bit library. The result is unpredictable -- it just happens to
work on x86 by accident. So your code needs to be rewritten,
anyway (see the reply by KHMan).
E.g. here's the output for Lua 5.2 on PPC/e500v2:
255 255 8
255 65535 16
255 16777215 24
255 4294967295 32
0 0 0
Oops?!
Developers just love unpredictable behavior, right? Yet another
example, which shows why I'm insisting on portable and consistent
semantics. Lua BitOp returns consistent results on all platforms.
> Another area of difficulty involves a function that reverses the order
> of bits [5]. There is not an easy way to optimize the pure Lua form
> with existing native bitwise operations:
That's troublesome in C, too. Only a few CPUs have special
instructions for that (e.g. rbit on ARMv6T2+). See this resource
for the common workarounds in C:
http://graphics.stanford.edu/~seander/bithacks.html#BitReverseObvious
> However, most of the other bit operations in
> ModuleCompressDeflateLua are shifts and modulus, which, although
> cleaner with their native operations, tend to be slightly slower when
> implemented as function calls (except maybe LuaJIT2).
>
> local x = 0
> local rshift = bit.rshift
> for i=1,1000000000 do
> x = (x - x % 256) / 256
> --x = rshift(x, 8)
> end
It's not too surprising that the version with rshift is much
faster with LuaJIT. It can be turned into a single machine code
instruction inside the loop:
->LOOP:
394cffe0 shr ebp, 0x08 <========
394cffe3 add ebx, +0x01
394cffe6 cmp ebx, 0x3b9aca00
394cffec jle 0x394cffe0 ->LOOP
That's another benefit of the Lua BitOp semantics: they allow much
better code generation in LuaJIT.
--Mike