Hi, I found some interesting stack overflow crash from my project.
Before we deep dive into the root cause of the crash, let’s try some interesting examples. I think test environment is not matter, but if you cannot reproduce results of below examples, try at
- OS: Ubuntu 20.04 LTS
- glibc: UBUNTU GLIBC 2.3.1
- Lua: Lua 5.4.4 (commit hash 0e5071b5fbcc244d9f8c4bae82e327ad59bccc3f)
which is the same as mine.
---------------------------------------------------------------------------------------------------
[example1.lua] -- case normal
local function func()
coroutine.wrap(func)()
end
func()
[result of example1.lua]
(Some repeated lines are skipped.)
.\example1.lua:2: .\example1.lua:2: .\example1.lua:2: .\example1.lua:2: C stack overflow
stack traceback:
[C]: in ?
.\example1.lua:2: in local 'func'
.\example1.lua:4: in main chunk
[C]: in ?
---------------------------------------------------------------------------------------------------
[example2.lua] -- case normal
local function func()
print(“Hello, lua!”)
coroutine.wrap(func)()
end
func()
[result of example2.lua]
(Some repeated lines are skipped.)
Hello, Lua!
Hello, Lua!
Hello, Lua!
.\example1.lua:3: .\example1.lua:3: .\example1.lua:3: .\example1.lua:3: C stack overflow
stack traceback:
[C]: in ?
.\example1.lua:2: in local 'func'
.\example1.lua:4: in main chunk
[C]: in ?
---------------------------------------------------------------------------------------------------
As you can find from the examples, lua interpreter is implemented well to deal with stack burst from recursive coroutine. You can find detail logics in ldo.c file, mainly in lua_resume, resume and luaD_rawrunprotected. Let me explain the logic briefly. Resuming coroutine, the value of nCcalls in the caller’s state is saved by LUAI_TRY macro to deal with error inside it. Also, when we resume a coroutine, the caller’s state is copied into the newly created state. Note that resume function itself increases the value of nCcalls. As a result, if we recursively resume coroutine, the value of nCcalls on state will be higher and higher, triggering an error, handled by LUAI_THROW and LUAI_TRY recursively. (LuaE_checkcstack function in lstate.c handles this error.)
Okay, that’s how the recursive coroutine is handled. The errors above are no surprise. But How about the next example, crash.lua?
---------------------------------------------------------------------------------------------------
[crash.lua] -- case crash
local function func()
pcall(1)
I think that pcall fails here too.
The first argument is function.
So.
static int luaB_pcall (lua_State *L) {
int status;
luaL_checkany(L, 1);
lua_pushboolean(L, 1); /* first result if no errors */
lua_insert(L, 1); /* put it in place */
status = lua_pcallk(L, lua_gettop(L) - 2, LUA_MULTRET, 0, 0, finishpcall);
return finishpcall(L, status, 0);
}
Should be:
static int luaB_pcall (lua_State *L) {
int status;
luaL_checktype(L, 1, LUA_TFUNCTION); /* check error function */
lua_pushboolean(L, 1); /* first result if no errors */
lua_insert(L, 1); /* put it in place */
status = lua_pcallk(L, lua_gettop(L) - 2, LUA_MULTRET, 0, 0, finishpcall);
return finishpcall(L, status, 0);
}
coroutine.wrap(func)()
end
func()
[result of crash.lua]
Segmentation fault(core dumped)
Anyway, in my local tests with lua interpreter do not crash.
regards,
Ranier Vilela