lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


This problem occurs only in our corporate application at my work. I'm sorry I couldn't reproduce it in a simple application.

But with the help of conditional breakpoints I think I managed to find the cause of L->nCcalls being less then L->nci, which led to it becoming negative number (>65500).

The reason is the original L->nCcalls was always overwritten with a new value in lua_resume. This simple patch fixes it for me:

diff --git a/src/ldo.c b/src/ldo.c
index f2f9062..202516d 100644
--- a/src/ldo.c
+++ b/src/ldo.c
@@ -646,6 +646,7 @@ LUA_API int lua_resume (lua_State *L, lua_State *from, int nargs,
int *nresults) {
int status;
unsigned short oldnny = L->nny; /* save "number of non-yieldable" calls */
+ unsigned short incnCcalls = (from) ? from->nCcalls + 1 : 1;
lua_lock(L);
if (L->status == LUA_OK) { /* may be starting a coroutine */
if (L->ci != &L->base_ci) /* not in base level? */
@@ -653,7 +654,7 @@ LUA_API int lua_resume (lua_State *L, lua_State *from, int nargs,
}
else if (L->status != LUA_YIELD)
return resume_error(L, "cannot resume dead coroutine", nargs);
- L->nCcalls = (from) ? from->nCcalls + 1 : 1;
+ L->nCcalls += incnCcalls;
if (L->nCcalls >= LUAI_MAXCCALLS)
return resume_error(L, "C stack overflow", nargs);
luai_userstateresume(L, nargs);
@@ -677,7 +678,7 @@ LUA_API int lua_resume (lua_State *L, lua_State *from, int nargs,
*nresults = (status == LUA_YIELD) ? L->ci->u2.nyield
: cast_int(L->top - (L->ci->func + 1));
L->nny = oldnny; /* restore 'nny' */
- L->nCcalls--;
+ L->nCcalls -= incnCcalls;
lua_unlock(L);
return status;
}

This problem seems to exist for more than 7 years since this commit: https://github.com/lua/lua/commit/3dc5475e239e2da52a380288ae8b293a6b019f81#diff-ab0c850bf343709e04183d5d90fe2df7R528

On Wed, Jun 27, 2018 at 5:11 PM Roberto Ierusalimschy <roberto@inf.puc-rio.br> wrote:
> Sometimes when calling collectgarbage() inside a coroutine, my application
> crashes with 'C Stack Overflow'. I managed to track it down to a negative
> number being assigned to nCcalls in lstate.c and ldo.c, so nCcalls becomes
> ~65534.

Many thanks for the report.

Are you sure about the cause? In particular, can you check whether the
problem is that (L->nci > L->nCcalls) when luaD_rawrunprotected is
called?  That should never happen.

Can you produce a "small" self-sufficient program that reproduces the bug?

-- Roberto