lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

> Well, this example works fine in isolation, no matter how hard I
> try to break it.

I was confident in that too.

> I don't see anything that could go wrong in the above loop itself.
> Probably something else messed up the 'users' table first.

What might that be for instance? Anything?

> Try
> prepending a dummy loop which only iterates through the 'users'
> table, but does nothing else. If the error moves, then the table
> structure itself is defective. It's unlikely that Valgrind will
> catch this, but recompiling with assertions turned on may help.

After issuing the error, the system is still running. So I need to wait for the next maintenance
delay to do that. What kind of inconsistency is causing the error? I tried to dig into the piece of
code which fires it and saw a comment - /* Unreachable */. Should I be worrying that this
condition can cause crash?

> Alternatively try adding a simple consistency check after every
> modification of the 'users' table (e.g. checking the element types
> and counting them).

> Or you could add the following before lj_err_msg(L, LJ_ERR_NEXTIDX)
> in lj_tab.c:

>   printf("key: %08x %08x\n", key->u32.hi, key->u32.lo);
>   printf("x/%dx %p\n", 3*(t->hmask+1), noderef(t->node));

Aha, ok. I will add that.

> Run the app with gdb and set a breakpoint on the lj_err_msg. After
> the breakpoint hits, run the printed gdb command and send me the
> complete output.

It is not so simple to do that since the app is serving thousands of clients in realtime and can't
be stopped whenever I want it.:-) I will try to cook a stress test emulating the behavior which
caused the error, and run it with gdb.

Thanks for the attention, Mike!

// Seny