lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Arseny Vakhrushev wrote:
> However after that, I noticed the above error again in a simple
> block under high load:
> 
> ------------------------------------------------------------
> local cur = {}
> for uid, user in pairs(users) do   <--- error appears here!
>         table.insert(cur, user)
> end
> ------------------------------------------------------------
> 
> Here 'users' is the table from which I copy data to 'cur' for
> next traversal. No metatable is assigned to it, nothing like
> that. It is just a table consisting of pairs:
>    'uid' (string) => {} (table)

Well, this example works fine in isolation, no matter how hard I
try to break it.

> As you can see, I don't modify the contents of the table here in
> this block nor do I call anything which could yield the current
> coroutine or trigger a metamethod or anything which could modify
> the table.
> 
> I'm pretty sure it happens not due to memory corruption or
> anything. The system is very robust and works for months until a
> manual restart. Valgrind tests are brilliant on LuaJIT with all
> debug options turned on. The size of 'users' was about 1000
> entries and the system was executing a lot of requests when the
> error happened.

I don't see anything that could go wrong in the above loop itself.
Probably something else messed up the 'users' table first. Try
prepending a dummy loop which only iterates through the 'users'
table, but does nothing else. If the error moves, then the table
structure itself is defective. It's unlikely that Valgrind will
catch this, but recompiling with assertions turned on may help.

Alternatively try adding a simple consistency check after every
modification of the 'users' table (e.g. checking the element types
and counting them).

Or you could add the following before lj_err_msg(L, LJ_ERR_NEXTIDX)
in lj_tab.c:

  printf("key: %08x %08x\n", key->u32.hi, key->u32.lo);
  printf("x/%dx %p\n", 3*(t->hmask+1), noderef(t->node));

Run the app with gdb and set a breakpoint on the lj_err_msg. After
the breakpoint hits, run the printed gdb command and send me the
complete output.

--Mike