lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It's some kind of combination of x64 + narrow + loop optimization

This crashes:
luajit64 -O2 bug.lua
luajit64 -O1 -O+narrow -O+loop bug.lua

These works (they are the same thing):
luajit64 -O1 -O+narrow bug.lua
luajit64 -O2 -O-loop   bug.lua

These works too: (agai, the same thing):
luajit64 -O1 -O+loop   bug.lua
luajit64 -O2 -O-narrow bug.lua

On 8/11/2011 11:42 AM, Dimiter 'malkia' Stanev wrote:
It works with -joff here (Windows 7), works for 32-bit, crashes for 64-bit

I've reduced the code to this

local ffi = require "ffi"

local bmask_1,bmask_0 = ffi.new("uint8_t[8]"),ffi.new("uint8_t[8]")
local allones = bit.tobit(0xff)
for i=0,7 do
bmask_1[i] = bit.lshift(1,7-i)
bmask_0[i] = bit.bxor(allones,bmask_1[i])
end

local band = bit.band

local bit_set_0 = function(_bset,_byte,_bit)
-- io.stderr:write("A ")
-- if _bit == 0 then end
local y = bmask_0[_bit]
-- io.stderr:write("B ")
_bset[_byte] = band(_bset[_byte],bmask_0[_bit])
end

local NBYTES = 30
local _bset = ffi.new("uint8_t[?]",NBYTES)

for i=1,2 do
local _byte,_bit = 0,-1
for j=1,NBYTES*8 do
_bit = _bit + 1
if _bit == 8 then _byte,_bit = _byte+1,0 end
-- io.stderr:write(_byte," ",_bit,"\n")
bit_set_0(_bset,_byte,_bit)
end
end

And after dumping

luajit32 -jdump > 1
luajit32 -jdump > 2

and leaving only the last page, I'm seeing that the 64-bit is not
loading EAX which is used for comparison for the first loop, (in 32-bit
that's ESI).

But I can hardly fix something like that. Here is the disassembly:

32-bit:

7efafdd8 movaps xmm5, xmm7
7efafddb mov esi, [esp+0xc] -> ESI is loaded here
7efafddf mov dword [0x001b02bc], 0x2
7efafde9 mov edx, [0x001b02cc]
7efafdef xorps xmm6, xmm6
7efafdf2 movsd xmm3, [0x001b8040]
7efafdfa cmp dword [edx+0x64], -0x0f
7efafdfe jnb 0x7efa0008 ->0
7efafe04 movsd xmm4, [edx+0x60]
7efafe09 movaps xmm7, xmm4
7efafe0c addsd xmm7, xmm3
7efafe10 cmp dword [edx+0x2c], -0x09
7efafe14 jnz 0x7efa0008 ->0
7efafe1a cmp dword [edx+0x3c], -0x0b
7efafe1e jnz 0x7efa0008 ->0
7efafe24 mov ebp, [edx+0x38]
7efafe27 cmp dword [edx+0x28], 0x001b8a20
7efafe2e jnz 0x7efa0008 ->0
7efafe34 cmp dword [edx+0x14], -0x0b
7efafe38 jnz 0x7efa0008 ->0
7efafe3e mov ebx, [edx+0x10]
7efafe41 cmp word [ebx+0x6], +0x5f
7efafe46 jnz 0x7efa0008 ->0
7efafe4c movzx ebx, byte [ebx+0x8]
7efafe50 cmp dword [edx+0x24], -0x09
7efafe54 jnz 0x7efa0008 ->0
7efafe5a cmp word [ebp+0x6], +0x60
7efafe5f jnz 0x7efa0008 ->0
7efafe65 cvttsd2si ecx, xmm4
7efafe69 movzx eax, byte [ecx+ebp+0x9]
7efafe6e cmp dword [edx+0x20], 0x001b6288
7efafe75 jnz 0x7efa0008 ->0
7efafe7b and ebx, eax
7efafe7d mov [ecx+ebp+0x9], bl
7efafe81 add edi, +0x01
7efafe84 cmp edi, esi -> Compared here (for i=1,2)
7efafe86 jg 0x7efa000c ->1
7efafe8c xorps xmm5, xmm5
7efafe8f cvtsi2sd xmm5, edi
7efafe93 movsd [edx+0x88], xmm5
7efafe9b movsd [edx+0x70], xmm5
7efafea0 movsd [edx+0x68], xmm6
7efafea5 movsd [edx+0x60], xmm7
7efafeaa jmp 0x7efafeb5

Missing here (I've looked up in the disassembly, and did not see EAX
assigned (EAX is used here instead of ESI))

b911fdbe movaps xmm15, xmm7
///// missing EAX
b911fdc2 mov dword [0x001c04a0], 0x2
b911fdcd mov edx, r10d
b911fdd0 movsd xmm13, [0x001d6dc0]
b911fdda xorps xmm6, xmm6
b911fddd cmp dword [rdx+0x64], 0xfffeffff
b911fde4 jnb 0x1b9110010 ->0
b911fdea movsd xmm14, [rdx+0x60]
b911fdf0 movaps xmm7, xmm14
b911fdf4 addsd xmm7, xmm13
b911fdf9 cmp dword [rdx+0x2c], -0x09
b911fdfd jnz 0x1b9110010 ->0
b911fe03 cmp dword [rdx+0x3c], -0x0b
b911fe07 jnz 0x1b9110010 ->0
b911fe0d mov ebp, [rdx+0x38]
b911fe10 cmp dword [rdx+0x28], 0x001cad70
b911fe17 jnz 0x1b9110010 ->0
b911fe1d cmp dword [rdx+0x14], -0x0b
b911fe21 jnz 0x1b9110010 ->0
b911fe27 mov esi, [rdx+0x10]
b911fe2a cmp word [rsi+0x6], +0x5f
b911fe2f jnz 0x1b9110010 ->0
b911fe35 movzx esi, byte [rsi+0x8]
b911fe39 cmp dword [rdx+0x24], -0x09
b911fe3d jnz 0x1b9110010 ->0
b911fe43 cmp word [rbp+0x6], +0x60
b911fe48 jnz 0x1b9110010 ->0
b911fe4e cvttsd2si ebx, xmm14
b911fe53 movzx r15d, byte [rbx+rbp+0x9]
b911fe59 cmp dword [rdx+0x20], 0x001c6fb8
b911fe60 jnz 0x1b9110010 ->0
b911fe66 and esi, r15d
b911fe69 mov [rbx+rbp+0x9], sil
b911fe6e add edi, +0x01
b911fe71 cmp edi, eax -> Compared here (for i=1,2)
b911fe73 jg 0x1b9110014 ->1
b911fe79 xorps xmm15, xmm15
b911fe7d cvtsi2sd xmm15, edi
b911fe82 movsd [rdx+0x88], xmm15
b911fe8b movsd [rdx+0x70], xmm15
b911fe91 movsd [rdx+0x68], xmm6
b911fe96 movsd [rdx+0x60], xmm7
b911fe9b jmp 0x1b911fea7

I'm no Mike Pall, but I'm sure he'll fix it as soon as he sees it.

On 8/11/2011 10:49 AM, Pierre Chapuis wrote:
Hello,

I have code that uses the FFI that segfaults when I run it in
LuaJIT-2.0.0-beta8 with hotfix #1 on Mac OS X. It is part of the code
that implements a bitset.

Here is a simplified version:

#!/usr/bin/env luajit -lluarocks.loader

local ffi = require "ffi"

local bmask_1,bmask_0 = ffi.new("uint8_t[8]"),ffi.new("uint8_t[8]")
local allones = bit.tobit(0xff)
for i=0,7 do
bmask_1[i] = bit.lshift(1,7-i)
bmask_0[i] = bit.bxor(allones,bmask_1[i])
end

local bit_set_0 = function(_bset,_byte,_bit)
io.stderr:write("A\n")
-- if _bit == 0 then end
local y = bmask_0[_bit]
io.stderr:write("B\n")
_bset[_byte] = bit.band(_bset[_byte],bmask_0[_bit])
end

local NBYTES = 30
local _bset = ffi.new("uint8_t[?]",NBYTES)

for i=1,2 do
local _byte,_bit = 0,-1
for j=1,NBYTES*8 do
_bit = _bit + 1
if _bit == 8 then _byte,_bit = _byte+1,0 end
io.stderr:write(_byte," ",_bit,"\n")
bit_set_0(_bset,_byte,_bit)
end
end

If I run this script, it segfaults at the beginning second iteration of
the loop (for i=1,2 do). The output is:

[...]
A
B
29 6
A
B
29 7
A
B
0 0
A
Segmentation fault

If I run it in GDB I get:

[...]
A
B
29 6
A
B
29 7
A
B
0 0
A

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000100085240
0x00000001390bfdea in ?? ()

It does NOT segfault for NBYTES < 22. It does NOT segfault if I
uncomment the line that has been commented out (if _bit == 0 then end).
It does NOT segfault if I do not require luarocks.loader.

Does anybody have an idea of why this happens or how I could help debug
it? Can anybody reproduce it?

Thanks for your help,