[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LuaJIT - loop conditional code generation
- From: Mike Pall <mikelu-1202@...>
- Date: Wed, 29 Feb 2012 20:49:07 +0100
Jani Piitulainen wrote:
> Branchless inner loop is indeed significantly faster, in this case 120%.
And the generated code is very good, too:
->LOOP:
394cffc0 mov r14d, r15d
394cffc3 and r14d, +0x07
394cffc7 lea r13d, [r14-0x1]
394cffcb shr r13d, 0x1f
394cffcf add ebx, r13d
394cffd2 xor r14d, +0x03
394cffd6 add r14d, -0x01
394cffda shr r14d, 0x1f
394cffde sub ebx, r14d
394cffe1 add ebp, ebx
394cffe3 add r15d, +0x01
394cffe7 cmp r15d, 0x05f5e100
394cffee jle 0x394cffc0 ->LOOP
394cfff0 jmp 0x394c001c ->3
The ARM code is pretty cool (side effect of -Ofuse optimization):
->LOOP:
00367fd4 and r8, r9, #7
00367fd8 sub r7, r8, #1
00367fdc add r10, r10, r7, lsr #31
00367fe0 eor r8, r8, #3
00367fe4 sub r8, r8, #1
00367fe8 sub r10, r10, r8, lsr #31
00367fec add r11, r10, r11
00367ff0 add r9, r9, #1
00367ff4 cmp r9, r0
00367ff8 ble 0x00367fd4 ->LOOP
00367ffc bl 0x00360024 ->3
--Mike