|
Hello, List!
Recently, my friends and I found an interesting crash from the latest Lua Interpreter. Using address sanitizer, we figured out that heap buffer overflow crash. To find the exact cause of the crash and to provide better patch guidance, we analyzed the crash more precisely.
I think the test environment does not matter, but if you cannot reproduce the results of the following examples, please try at
which is the same as mine.
Before we deep dive into the root cause, let's take a look at the crash script.
=====[ crash.lua ]=====
x = {} for i=1, 2000 do x[i] = i end
local function f() end
local function g() return f(table.unpack(x)) end
collectgarbage("step") setmetatable({}, {__gc = 1})
g()
==================
There are two points that we need to pay attention to. First, it seems that function g() passes 2000 arguments to function f(), using table.unpack. Second, in setmetatable, __gc metamethod is set to 1. Garbage collection will raise an error (number 1 is not callable) later.
So, Let's run the script... and, BOOM!
=====[ Result of crash.lua ]=====
==11076==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000002214 at pc 0x55bc3aa47195 bp 0x7ffdf6a20e70 sp 0x7ffdf6a20e60 READ of size 4 at 0x602000002214 thread T0
#0 0x55bc3aa47194 in funcnamefromcode /home/user/Project_Lua/lua/ldebug.c:605
#1 0x55bc3aa47b29 in luaG_callerror /home/user/Project_Lua/lua/ldebug.c:738
#2 0x55bc3aa4b61b in luaD_tryfuncTM /home/user/Project_Lua/lua/ldo.c:396
#3 0x55bc3aa4d0be in luaD_precall /home/user/Project_Lua/lua/ldo.c:589
#4 0x55bc3aa4d16c in ccall /home/user/Project_Lua/lua/ldo.c:607
#5 0x55bc3aa4d282 in luaD_callnoyield /home/user/Project_Lua/lua/ldo.c:627
#6 0x55bc3aa5809a in dothecall /home/user/Project_Lua/lua/lgc.c:895
#7 0x55bc3aa498c1 in luaD_rawrunprotected /home/user/Project_Lua/lua/ldo.c:144
#8 0x55bc3aa4ecb1 in luaD_pcall /home/user/Project_Lua/lua/ldo.c:926
#9 0x55bc3aa58694 in GCTM /home/user/Project_Lua/lua/lgc.c:915
#10 0x55bc3aa58996 in callallpendingfinalizers /home/user/Project_Lua/lua/lgc.c:945
===========================
According to the report of address sanitizer, we could find that the problem was heap buffer overflow. Let's inspect the call stack. Everything seems as we expected above, the finalizer and error handling functions are called. Note that you can also find luaD_tryfuncTM function from the call stack, which checks whether '1' has __call metamethod or not. funcnamefromcode function, the topmost function in the call stack, leads to heap buffer overflow. Here's a piece of code from the function.
====[funcnamefromcode in ldebug.c]====
.....
static const char *funcnamefromcode (lua_State *L, CallInfo *ci, const char *name) {
TMS tm = (TMS)0;
const Proto p = ci_func(ci)->p;
int pc = currentpc(ci);
Instruction i = p->code[pc]; // CRASH LINE
.....
==============================
Through analysis, we could find that the commented line leads to the crash, as the index of p→code array is out of bound. In other words, pc value is set too high. The root cause of the problem belongs to luaD_pretailcall, which is used when function g() calls function f() in the crash script. Here's the code.
========[luaV_execute in lvm.c]==============
vmcase(OP_TAILCALL) {
... savepc(ci); / several calls here can raise errors /
if (TESTARG_k(i)){
luaF_closeupval(L, base); / close upvalues from current call /
lua_assert(L->tbclist < base); / no pending tbc variables /
lua_assert(base == ci->func + 1); }
if ((n = luaD_pretailcall(L, ci, ra, b, delta)) < 0) / Lua function? /
...
=====================================
========[luaD_pretailcall in ldo.c]=============
int luaD_pretailcall (lua_State *L, CallInfo ci, StkId func, int narg1, int delta) {
retry:
switch (ttypetag(s2v(func))) {
...
case LUA_VLCL: {/ Lua function */
Proto p = clLvalue(s2v(func))->p;
int fsize = p->maxstacksize; / frame size /
int nfixparams = p->numparams;
int i; ci->func -= delta; / restore 'func' (if vararg) /
for (i = 0; i < narg1; i++) / move down function and arguments /
setobjs2s(L, ci->func + i, func + i);
checkstackGC(L, fsize);
func = ci->func; / moved-down function /
for (; narg1 <= nfixparams; narg1++)
setnilvalue(s2v(func + narg1)); / complete missing arguments /
ci->top = func + 1 + fsize; / top for new function /
lua_assert(ci->top <= L->stack_last);
ci->u.l.savedpc = p->code; / starting point /
ci->callstatus |= CIST_TAIL;
L->top = func + narg1; / set top */
return -1;
....
=========================================
As you see in luaV_execute, current pc of function g() is saved to ci→u.l.savedpc. Lua normally saves pc before calling other functions, to handle possible errors that may occur inside the callee. In luaD_pretailcall function, as function f() is tail-called by function g(), g() is no longer needed. function f() and its arguments are copied to Callinfo variable ci of g() in the code below.
=========================================
for (i = 0; i < narg1; i++) / move down function and arguments /
setobjs2s(L, ci->func + i, func + i);
=========================================
However, right after copying those values, checkstackGC macro is executed, which possibly can call garbage collection related functions. Crash code exactly does garbage collection within the macro, and further calls erroneous finalizer function. Finalizer is set as a number value(1) in the crash code, so Lua internally calls a series of error handling functions. Throughout getting error information from the code, Lua calls currentpc which is implemented as below.
static int currentpc (CallInfo *ci) { lua_assert(isLua(ci)); return pcRel(ci->u.l.savedpc, ci_func(ci)->p); }
At this moment, ci→u.l.savedpc is set as pc of function g() in luaV_execute function before calling luaD_pretailcall function. However, ci_func(ci)→p refers to function f() because it was copied just before the checkstackGC macro. This eventually returns a relative distance between two different functions(f() and g()), not from the same function, and finally leads to heap overflow.
- Patch Suggestion
Although the reason for the crash is simple, it was hard to patch. We have to change lots of code to handle this special case - finalizer being called during luaD_pretailcall function. To solve the problem, we devised 3 different patches.
Patch1 - Updating tail-called function f() after checkstackGC(L, fsize) macro is executed
In luaD_pretailcall function, function f() and its arguments are copied to the Lua stack with the following code.
=========================================
for (i = 0; i < narg1; i++) / move down function and arguments / setobjs2s(L, ci->func + i, func + i);
checkstackGC(L, fsize);
=========================================
Instead, we only copy arguments to the Lua stack and copy function f() to Lua stack right after executing checkstackGC macro as below. ("o" is just a temporary variable to store f(). ci = L→ ci in line 5 is added to prevent ci pointing to freed area, which can happen when the Lua stack is reallocated during checkstackGC macro)
=========================================
TValue o = (s2v(func)); for (i = 1; i < narg1; i++) / move down function and arguments */
setobjs2s(L, ci->func + i, func + i);
checkstackGC(L, fsize); ci = L->ci; setobj2s(L, ci->func, &o)
=========================================
In this way, when erroneous finalizer is called, error information is searched only from function g(), not from both f() and g() as g() remains to Lua stack during the execution of checkstackGC macro. However, the patch has a slight flaw. Although the error is not propagated as the finalizer in Lua is called with protected mode, using gdb, we found that the error message was represented as below.
==========================================
"crash.lua:10: attempt to call a number value (upvalue 'f')"
==========================================
Though it is correct that the error happens by calling a number value, but the sentence "... call a number value (upvalue 'f') " doesn't seem to be properly giving error information.
Patch2 - Modifiying error handling functions
There are two main obstacles during error handling in calling the finalizer. The first one is funcnamefromcode in luaG_callerror function. funcnamefromcode function internally use currentpc, which is the problem itself. However, we can use a wrapper function - getfuncname function rather than funcnamefromcode function. getfuncname function return "__gc metamethod" when CIST_FIN flag is set in ci→callstatus, otherwise leading to funcnamefromcode function.
======[patch, luaG_callerror in ldebug.c]=========
l_noret luaG_callerror (lua_State *L, const TValue *o) { CallInfo *ci = L->ci; const char name = NULL; / to avoid warnings */
const char*kind = (isLua(ci)) ? funcnamefromcode(L, ci &name) : NULL; // Removed
const char *kind = (isLua(ci)) ? getfuncname(L, ci, &name) : NULL; // New
const char *extra = kind ? formatvarinfo(L, kind, name) : varinfo(L, o);
typeerror(L, o, "call", extra); }
======================================
The second one is typeerror function in luaG_callerror function. The function lead to luaG_runerror, and luaG_runerror function use getcurrentline() function to get information of line number. As the function in current callinfo is treated as Lua function, the function tries to get the line number of the call info based on relative pc - which is also the problem itself. So, to solve the problem, we added a logic to check whether the value of pc in callinfo and savedpc both belongs to one single function or not. If not, we treat the case as abnormal and return -1 as line number.
===[patch, getcurrentline in ldebug.c]=========
static int getcurrentline (CallInfo ci) {
Proto p = ci_func(ci)->p;
int rel_pc = currentpc(ci);
if(l_unlikely(p->sizecode < rel_pc)) {
return -1; // NOT IN BOUNDARY OF ONE FUNCTION
}
else{
return luaG_getfuncline(p, rel_pc);
} }
=====================================
The result is quite reasonable. However, this patch needs much fix on Lua source code.
======[result of running crash.lua after patch 2] ======
crash.lua:-1: attempt to call a number value (metamethod '__gc')
===========================================
Patch3 - appending CIST_C value to callstatus before checkstackGC is executed and remove it right after the execution
================================
ci→callstatus |= CIST_C;
checkstackGC(L, fsize);
ci→callstatus &= ~CIST_C;
================================
If CIST_C value is ORed to ci→callstatus, Lua doesn't try to get information from neither Lua function "g" nor "f", as it recognizes current ci is from C function. Specifically, we can avoid calling funcnamefromcode(ldebug.c:600) in luaG_callerror(ldebug.c:735)
l_noret luaG_callerror (lua_State *L, const TValue *o) { CallInfo *ci = L->ci; const char name = NULL; / to avoid warnings */ const char *kind = (isLua(ci)) ? funcnamefromcode(L, ci, &name) : NULL; const char *extra = kind ? formatvarinfo(L, kind, name) : varinfo(L, o); typeerror(L, o, "call", extra); }
This patch obviously handles crash efficiently, but as ci is not related to C function in the crash, it may not be the best option to fix the problem.
Found by: Minseok Kang, Jihoi Kim, HyungChan Kim
Team Nil Armstrong
-- Regards, Minseok Kang