lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I believe the problem is in the function atomic() in lgc.c, at line 531:

531  /* remark objects cautch by write barrier */
532  propagateall(g);
533  /* remark occasional upvalues of (maybe) dead threads */
534  remarkupvals(g);
535  /* remark weak tables */
536  g->gray = g->weak;
537  g->weak = NULL;

Line 534 marks open upvals, which effectively marks the object in the upval. However, the marks are not propagated, and then the gray list is wiped out at line 536. (The explanation for remarkupvals is in Roberto's very useful presentation on Lua's internals.)

Moving the call to propagateall() at line 532 so that it comes after line 534 seems to solve the problem.

(However, I wonder if the weak tables shouldn't be marked after the running thread and basic metatables are marked. I can't claim to fully understand the code (yet :) )

R.

On 1-Aug-05, at 8:01 PM, Mike Pall wrote:

Hi,

uh oh ... I've been hunting this one for the past two days.
I still have no clue what's causing it or how to fix it.
At first I thought it was something with my changes, but I
get it with a plain vanilla copy of Lua 5.1-work6.

At least now I have a minimal testcase that guarantees a crash
or triggers an assertion (in case you turned them on). This
is originally from some macro processing inside PASM. It's
very condensed and has no useful functionality anymore.

But it triggers the bug pretty reliably:

local macros = {}
local capture

local function macro(name)
  capture = function(s)
    if s == "." then
      capture = nil
      macros[name] = {}
    end
  end
end

collectgarbage("setpause", 1)

local str = ""
for i=1,1000 do str = str..i.."|*|.|" end

for i=1,1000 do
  for line in string.gfind(str, "[^|]+") do
    if capture then capture(line) else macro(line) end
  end
end

Note that the code seems very fragile. Using the empty string
as an end marker instead of "." doesn't trigger it. Omitting
the dummy "*", too. Not using the strange string creation and
splitting fails to trigger it either. But this seems more related
to stressing the GC with many strings than to the bug itself.

It usually triggers in luaV_execute at OP_GETUPVAL, when 'name'
is retrieved. The assertion fails because isdead() is true
(only otherwhite is set). I.e. the (closed) upvalue is dead.

The object chain should be:

  stack slots --> closure --> upvalue (closed)
  capture/call    (inner)     name

It seems important to use the string as a table key (at least
in the first few iterations that work just fine). The {} value
can be replaced with anything that drives the GC forward (e.g.
allocating a dummy closures with 'function() end').

The collectgarbage("setpause", 1) is just so it triggers earlier.
I can trigger it in the 'real' application even with default
settings. There I get assertions all over the place. Probably
as a consequence of other things going on that reuse the space
of the already free'd upvalue object.

With valgrind --tool=memcheck I see this:

Invalid read of size 4
   at luaV_execute (lvm.c:415)
   by luaD_call (ldo.c:355)
   by f_call (lapi.c:787)
   by luaD_rawrunprotected (ldo.c:94)
   by luaD_pcall (ldo.c:448)
   by lua_pcall (lapi.c:808)
   by docall (lua.c:93)
   by handle_argv (lua.c:314)
 Address 0x1BA5DC10 is 8 bytes inside a block of size 24 free'd
   at free (vg_replace_malloc.c:153)
   by l_alloc (lauxlib.c:676)
   by luaM_realloc_ (lmem.c:79)
   by luaF_freeupval (lfunc.c:92)
   by sweeplist (lgc.c:427)
   by singlestep (lgc.c:586)
   by luaC_step (lgc.c:618)
   by luaV_execute (lvm.c:706)

This confirms that the upvalue has been free'd even though
it's still referenced.

Ok, so far so bad. I'm sorry, I'm completely out of ideas.
Help! :-)

Bye,
     Mike