lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Does this happen due to the 1000x for loop, or also if you reduce the
for loop count to small counts? (Are you sure that you are not running
out of memory / at which loop count value will this typically
happen?).

On Mon, Nov 29, 2021 at 11:08 AM 김지회 <pascal4847@gmail.com> wrote:
>
> Hello, List!
>
>    Recently, my friends and I found interesting crashes in Lua Interpreter. In this mail, we’ll talk about the root cause of the crashes and how we analyzed them. Moreover, we find that using this problem, we can do Sandbox Escape so that execute ‘/bin/sh’ without os.execute and io.popen in Ubuntu 20.04. As the root cause is related to the logic of garbage collection, if you’re not familiar with the internal logic of LuaGC, we recommend you to take a look at the workshop PPT in 2018 for clear comprehension.
>
> Link: https://www.lua.org/wshop18/Ierusalimschy.pdf
>
>    Before we deep dive into the root cause and sandbox escape PoC, we’ll show some interesting crashes, for example. I think the test environment does not matter, but if you cannot reproduce the results of the following examples please try at
>
> - OS : Ubuntu 20.04 LTS
>
> - glibc : UBUNTU GLIBC 2.3.1
>
> - Lua : Lua 5.4.4 (commit hash ad3942adba574c9d008c99ce2785a5af19d146bf)
>
>    which is the same as mine. Or, you can just use the docker file we created to build PoC. The PoC link is at the third part of this mail. The following contents consist of four parts.
>
> 1. Understanding crashes.
>
> 2. Defining the root cause of the crashes.
>
> 3. Sandbox Escape Exploit & PoC.
>
> 4. Finally, we will show a simple patch to fix the problem.
>
> Now, let’s start!
>
> ----------------------------------------------------------------------------
>
> Part 1. Understanding crashes.
>
>    Although we found various crashes related to this problem, the following example is the best among them to understand this problem.
>
>
>
> ---------- [ crash1.lua ] -----------
>
> function func0 ()
>
> collectgarbage("step")
>
> end
>
>
>
> function func1 ()
>
> local func1_table = {}
>
> local func1_meta = { __gc = func0 }
>
> setmetatable(func1_table, func1_meta)
>
> collectgarbage("step" , 1, func1_table)
>
> end
>
>
>
> function func2 ()
>
> local func2_table = {}
>
> local func2_meta = { __gc = func1 }
>
> setmetatable(func2_table, func2_meta)
>
> end
>
>
>
> for i = 0,1000,1 do
>
> func2()
>
> end
>
>
>
> ------------------------------------------------
>
>    Wow, It looks horrible. Let’s put the script into Lua Interpreter with address sanitizer... and, Boom!
>
>
>
> ---------- [ Result of crash1.lua ] -----------
>
> root@Newbie:/home/Sandbox/temp/lua# ./lua ../crash1.lua
>
> AddressSanitizer:DEADLYSIGNAL
>
> ==170305==ERROR: AddressSanitizer: SEGV
>
> on unknown address 0x00000005 (pc 0x5657e511 bp 0xf51012a0 sp 0xfff8f3b0 T0)
>
> ==170305==The signal is caused by a READ memory access.
>
> ==170305==Hint: address points to the zero page.
>
> #0 0x5657e510 in separatetobefnz (/home/Sandbox/temp/lua/lua+0x29510)
>
> #1 0x565838a9 in atomic (/home/Sandbox/temp/lua/lua+0x2e8a9)
>
> #2 0x56586781 in luaC_step (/home/Sandbox/temp/lua/lua+0x31781)
>
> #3 0x5656b7cd in lua_gc (/home/Sandbox/temp/lua/lua+0x167cd)
>
> #4 0x565dd9da in luaB_collectgarbage (/home/Sandbox/temp/lua/lua+0x889da)
>
> #5 0x565778b9 in luaD_precall (/home/Sandbox/temp/lua/lua+0x228b9)
>
> #6 0x565b1239 in luaV_execute (/home/Sandbox/temp/lua/lua+0x5c239)
>
> #7 0x56578998 in luaD_callnoyield (/home/Sandbox/temp/lua/lua+0x23998)
>
> #8 0x5658003b in dothecall (/home/Sandbox/temp/lua/lua+0x2b03b)
>
> #9 0x5657394d in luaD_rawrunprotected (/home/Sandbox/temp/lua/lua+0x1e94d)
>
> #10 0x56579789 in luaD_pcall (/home/Sandbox/temp/lua/lua+0x24789)
>
> #11 0x5657fcf8 in GCTM (/home/Sandbox/temp/lua/lua+0x2acf8)
>
> ....
>
> #231 0x5657fcf8 in GCTM (/home/Sandbox/temp/lua/lua+0x2acf8)
>
> #232 0x5658468f in singlestep (/home/Sandbox/temp/lua/lua+0x2f68f)
>
> #233 0x56586756 in luaC_step (/home/Sandbox/temp/lua/lua+0x31756)
>
> #234 0x5656b7cd in lua_gc (/home/Sandbox/temp/lua/lua+0x167cd)
>
> #235 0x565dd9da in luaB_collectgarbage (/home/Sandbox/temp/lua/lua+0x889da)
>
> #236 0x565778b9 in luaD_precall (/home/Sandbox/temp/lua/lua+0x228b9)
>
> #237 0x565b1239 in luaV_execute (/home/Sandbox/temp/lua/lua+0x5c239)
>
> #238 0x56578998 in luaD_callnoyield (/home/Sandbox/temp/lua/lua+0x23998)
>
> #239 0x5658003b in dothecall (/home/Sandbox/temp/lua/lua+0x2b03b)
>
> #240 0x5657394d in luaD_rawrunprotected (/home/Sandbox/temp/lua/lua+0x1e94d)
>
> #241 0x56579789 in luaD_pcall (/home/Sandbox/temp/lua/lua+0x24789)
>
> #242 0x5657fcf8 in GCTM (/home/Sandbox/temp/lua/lua+0x2acf8)
>
> #243 0x5658468f in singlestep (/home/Sandbox/temp/lua/lua+0x2f68f)
>
> #244 0x56586756 in luaC_step (/home/Sandbox/temp/lua/lua+0x31756)
>
> #245 0x5656b7cd in lua_gc (/home/Sandbox/temp/lua/lua+0x167cd)
>
> #246 0x565dd9da in luaB_collectgarbage (/home/Sandbox/temp/lua/lua+0x889da)
>
>
>
> AddressSanitizer can not provide additional info.
>
> SUMMARY: AddressSanitizer: SEGV (/home/Sandbox/temp/lua/lua+0x29510) in separatetobefnz
>
> ==170305==ABORTING
>
>
>
> ---------------------------------------------
>
>    Hmm. It’s a segmentation fault error. According to the call stack, It seems like the problem is in separatetobefnz function. Let’s take a look at the code.
>
>
>
> ---------- [ separatetobefnz in lgc.c] -----------
>
> static void separatetobefnz (global_State *g, int all) {
>
> GCObject *curr;
>
> GCObject **p = &g->finobj;
>
> GCObject **lastnext = findlast(&g->tobefnz);
>
> while ((curr = *p) != g->finobjold1) { /* traverse all finalizable objects */
>
> lua_assert(tofinalize(curr));
>
> if (!(iswhite(curr) || all)) /* not being collected? */
>
> p = &curr->next; /* don't bother with it */
>
> else {
>
> if (curr == g->finobjsur) /* removing 'finobjsur'? */
>
> g->finobjsur = curr->next; /* correct it */
>
> *p = curr->next; /* remove 'curr' from 'finobj' list */
>
> curr->next = *lastnext; /* link at the end of 'tobefnz' list */
>
> *lastnext = curr;
>
> lastnext = &curr->next;
>
> }
>
> }
>
> }
>
>
>
> ------------------------------------------------------
>
>    By some experiments, we found that curr variable is NULL so that the iswhite function is forced to refer invalid address (0x00000005). You can find the logic in line 6 and 8 of the function. Maybe the linked list in finobj is broken, so that the curr variable is corrupted. To analyze the exact state of the linked list, we wrote some custom codes to debug the problem.
>
>
>
> --------- [ Results of custom debugging log] ---------------
>
> -----[116th SEPARATE CALL]
>
> [000]0xf4f2adf0 > [001]0xf4f2ae80 > [002]0xf4f2af10 > [003]0xf4f2afa0 > [004]0xf4f2b030 >
>
> [005]0xf4f2b0c0 > [006]0xf4f2b150 > [007]0xf4f2b1e0 > [008]0xf4f2b270 > [009]0xf4f2b300 >
>
> .....
>
> [190]0xf4f24790 > [191]0xf4f24820 > [192]0xf4f248b0 > [193]0xf4f24940 > [194]0xf4f249d0 >
>
> [195]0xf4f24a60 > [196]0xf4f01cc0 > [197]0xf4f01d50 > [198]0xf4f01e10 > [199]0xf4f012a0 >
>
> [200] (nil)
>
> -----[FINOBJOLD1 0xf4f2adf0] [FINOBJROLD 0xf4f2adf0]
>
>
>
> [++++] 0xf4f2c500 freed (g->finobjold1: 0xf4f2adf0 | g->finobjrold: 0xf4f2adf0)
>
> [++++] 0xf4f2adf0 freed (g->finobjold1: 0xf4f2adf0 | g->finobjrold: 0xf4f2adf0)
>
>
>
> -----[117th SEPARATE CALL]
>
> [000]0xf4f2ae80 > [001]0xf4f2af10 > [002]0xf4f2afa0 > [003]0xf4f2b030 > [004]0xf4f2b0c0 >
>
> [005]0xf4f2b150 > [006]0xf4f2b1e0 > [007]0xf4f2b270 > [008]0xf4f2b300 > [009]0xf4f2b390 >
>
> .....
>
> [190]0xf4f24820 > [191]0xf4f248b0 > [192]0xf4f24940 > [193]0xf4f249d0 > [194]0xf4f24a60 >
>
> [195]0xf4f01cc0 > [196]0xf4f01d50 > [197]0xf4f01e10 > [198]0xf4f012a0 > [199] (nil)
>
> -----[FINOBJOLD1 0xf4f2adf0] [FINOBJROLD 0xf4f2adf0]
>
>
>
> AddressSanitizer: SEGV(0x00000005)
>
>
>
> ------------------------------------------------------------------
>
>    Wow. It seems that finobjold1 and finobjrold are not set to a proper variable when the object they point is freed. That makes line 32 of separatetobefnz function unable to find the break condition of the loop so that finally visit the next node of the last element in finobj linked list (which is nil). Now, let’s deep dive into the crash to find the root cause.
>
>    Using gdb and static analysis, we find that separatetobefnz function is called from stepgenfull function when it leads to the crash. ( stepgenfull -> atomic -> separatetobefnz ). Then, we can find something weird in stepgenfull function.
>
>
>
> --------- [ stepgenfull in lgc.c] -------------
>
> static void stepgenfull (lua_State *L, global_State *g) {
>
> lu_mem newatomic; /* count of traversed objects */
>
> lu_mem lastatomic = g->lastatomic; /* count from last collection */
>
> if (g->gckind == KGC_GEN) /* still in generational mode? */
>
> enterinc(g); /* enter incremental mode */
>
> luaC_runtilstate(L, bitmask(GCSpropagate)); /* start new cycle */
>
> newatomic = atomic(L); /* mark everybody */
>
> if (newatomic < lastatomic + (lastatomic >> 3)) { /* good collection? */
>
> atomic2gen(L, g); /* return to generational mode */
>
> setminordebt(g);
>
> }
>
> else { /* another bad collection; stay in incremental mode */
>
> g->GCestimate = gettotalbytes(g); /* first estimate */;
>
> entersweep(L);
>
> luaC_runtilstate(L, bitmask(GCSpause)); /* finish collection */
>
> setpause(g);
>
> g->lastatomic = newatomic;
>
> }
>
> }
>
>
>
> ------------------------------------------------------------------
>
>    Yes. line 6 (atomic function call) of the stepgenfull function should be run in incremental mode. So, separatetobefnz function of the crash must have nil value in both finobjold1 and finobjrold! Why the variables are not nil even after enterinc function? The answer is that, actually, the atomic function is called in generational mode, not incremental mode. It is obvious that such flow is not considered as normal, as comments say. Then, How? it seems that there is no chance g->gckind to be generational mode after enterinc function (line 4).
>
>    After some analysis, we found that runtilstate function can change the value of g->gckind, so that the weird flow happens. You can check the phenomenon by modifying stepgenfull function into the following code for debugging.
>
>
>
> --------- [ stepgenfull, modified for analyze ] -------------
>
> static void stepgenfull (lua_State *L, global_State *g) {
>
> lu_mem newatomic;
>
> lu_mem lastatomic = g->lastatomic;
>
> if (g->gckind == KGC_GEN)
>
> enterinc(g);
>
> if (g->gckind == KGC_GEN) printf(“[!] luaC_runtilstate does not start with KGC_INC\n”);
>
> // Above line is added for debugging purpose.
>
> luaC_runtilstate(L, bitmask(GCSpropagate));
>
> if (g->gckind == KGC_INC) printf(“[!] luaC_runtilstate change gckind into KGC_GEN\n”);
>
> // Above line is added for debugging purpose.
>
> newatomic = atomic(L);
>
> .....
>
>
>
> --------- [ Result of stepgenfull, modified for analyze ] -------
>
> root@Newbie:/home/Sandbox/temp/lua# ./lua ../crash1.lua
>
> [!] luaC_runtilstate Change gckind into KGC_GEN
>
> [!] luaC_runtilstate Change gckind into KGC_GEN
>
> AddressSanitizer:DEADLYSIGNAL
>
> ...
>
>
>
> ------------------------------------------------------------------
>
>    We thought that there is no possible way that runtilstate function change g->gckind, so we needed to go deeper. Thanks to gdb, we found that runafewfinalizers function in singlestep function can change the g->gckind. The reason is quite simple. runafewfinalizers function calls GCTM function, which can recursively call another garbage collection (this is the reason why crash1.lua puts collectgarbage function inside of gc metamethod). As a result, good collection happened during GCTM function, and gckind was set in generational mode.
>
>    We used the following code to make sure of the analysis.
>
>
>
> --------- [singlestep in lgc.c, modified for analyze ] -------------
>
> ...
>
> case GCScallfin: { /* call remaining finalizers */
>
> if (g->tobefnz && !g->gcemergency) {
>
> g->gcstopem = 0; /* ok collections during finalizers */
>
> int temp = g->gckind; // This line is added for debugging purpose.
>
> work = runafewfinalizers(L, GCFINMAX) * GCFINALIZECOST;
>
> if(g->gckind != temp) printf(“[!] runafewfinalizers change g->gckind\n”);
>
> // Above line is added for debugging purpose.
>
> }
>
> else { /* emergency mode or no more finalizers */
>
> g->gcstate = GCSpause; /* finish collection */
>
> work = 0;
>
> }
>
> ...
>
>
>
> --------- [Result of singlestep in lgc.c, modified for analyze ] -------------
>
> root@Newbie:/home/Sandbox/temp/lua# ./lua ../crash1.lua
>
> [!] runafefinalizers change g->gckind
>
> [!] runafefinalizers change g->gckind
>
> [!] luaC_runtilstate change gckind into KGC_GEN
>
> [!] runafefinalizers change g->gckind
>
> [!] luaC_runtilstate change gckind into KGC_GEN
>
> AddressSanitizer:DEADLYSIGNAL
>
>
>
> ------------------------------------------------------------------
>
>   Part 2. Defining the root cause of the crash.
>
>    So, In short, the root cause is...
>
> 1. singlestep function with case GCScallfin can change the mode of garbage collection, because GCTM can recursively call garbage collection logic.
>
> 2. runtilstate function can internally call singlestep function with case GCScallfin.
>
> 3. Some functions are developed with the assumption that runtilstate cannot change the mode of garbage collection, as stepgenfull function is.
>
> 4. As a result, the functions may run increment mode logic in generational mode.
>
> 5. Such behavior can break linked list in global state, lead to various crashes for example SEGV in separatetobefnz function.
>
>
>
>     There are two ways we find that GCTM function changes the mode of garbage collection in the current global state.
>
> 1. As we showed above, when a good collection occurs during incremental step, the mode can be changed during GCTM.
>
> 2. Explicitly calling garbagecollection(“generational”) can change the mode during GCTM.
>
>
>
>    The following script shows that the second case(explicitly calling mode change) also creates a similar crash.
>
>
>
> ---------- [ crash2.lua ] -----------
>
> setmetatable({}, {
>
> __gc = function()
>
> setmetatable({}, {
>
> __gc = function()
>
> collectgarbage("generational") -- Explicitly change mode
>
> setmetatable({}, {
>
> __gc = function()
>
> collectgarbage("step")
>
> collectgarbage("step")()
>
> end
>
> })
>
> end
>
> })
>
> collectgarbage("step")
>
> end
>
> })
>
>
>
> ---------- [ Result of crash2.lua ] -----------
>
> root@Newbie:/home/Sandbox/temp/lua# ./lua ../../DebugLua/crash2.lua
>
> [!] runafefinalizers change g->gckind
>
> =================================================================
>
> ==171896==ERROR: AddressSanitizer: heap-use-after-free on address 0xf51034fc at pc 0x5656e70e bp 0xffffbf78 sp 0xffffbf68
>
> READ of size 4 at 0xf51034fc thread T0
>
> #0 0x5656e70d in funcnamefromcode (/home/Sandbox/temp/lua/lua+0x1970d)
>
> #1 0x56572068 in luaG_callerror (/home/Sandbox/temp/lua/lua+0x1d068)
>
> #2 0x56575635 in luaD_tryfuncTM (/home/Sandbox/temp/lua/lua+0x20635)
>
> .....
>
> #23 0x5655f5e5 in main (/home/Sandbox/temp/lua/lua+0xa5e5)
>
> #24 0xf767aee4 in __libc_start_main (/lib/i386-linux-gnu/libc.so.6+0x1eee4)
>
> #25 0x5655feb4 in _start (/home/Sandbox/temp/lua/lua+0xaeb4)
>
> .....
>
>
>
> SUMMARY: AddressSanitizer: heap-use-after-free (/home/Sandbox/temp/lua/lua+0x1970d) in funcnamefromcode
>
> .....
>
> ----------------------------------------------------
>
>
>
>    Note that the above crash is not related to sweepgenfull function, but originated from the same reason. As the problem is based on the same root cause, we will not explain details about crash2.lua here. You can easily find that GCTM also change the gckind into KGC_GEN as crash1.lua did, by explicitly calling collectgarbage(“generational”) in gc metamethod.
>
>
>
> Part 3. Sandbox Escape Exploit & PoC.
>
>    Actually, as you can see from the above examples, this problem can corrupt the linked lists of objects in the global state. As a result, we find that we can do “Tcache Poisoning” using this problem. The tcache poisoning can lead to sandbox escape. However, in this mail, we will not explain the details about the exploit (too long to explain).
>
>    You can check the exploit and detailed explanation of it from the Github link below, with docker file.
>
> Github link: https://github.com/Lua-Project/lua-5.4.4-sandbox-escape-with-new-vulnerability
>
>
>
> ------- [Results of Sandbox Escape] ------
>
> root@55bbe1743963:/LUA/lua# /LUA/lua/lua /LUA/exploit.lua
>
> sh: 1: еdUUU: not found
>
> # whoami
>
> root
>
>
>
> ----------------------------------------------------
>
> Part 4. Simple patch to fix the problem. (Suggestion)
>
>    There can be many ways to patch this problem. Maybe we can forbid developers to use garbage collection in gc metamethod. However, we devised the following patch as a solution, without breaking the usability and backward compatibility.
>
> --------- [singlestep in lgc.c BEFORE patch] -------------
>
> ...
>
> case GCScallfin: { /* call remaining finalizers */
>
> if (g->tobefnz && !g->gcemergency) {
>
> g->gcstopem = 0; /* ok collections during finalizers */
>
> work = runafewfinalizers(L, GCFINMAX) * GCFINALIZECOST;
>
> }
>
> else { /* emergency mode or no more finalizers */
>
> g->gcstate = GCSpause; /* finish collection */
>
> work = 0;
>
> }
>
> ...
>
> --------- [singlestep in lgc.c AFTER patch] -------------
>
> ...
>
> case GCScallfin: { /* call remaining finalizers */
>
> if (g->tobefnz && !g->gcemergency) {
>
> g->gcstopem = 0; /* ok collections during finalizers */
>
> work = runafewfinalizers(L, GCFINMAX) * GCFINALIZECOST;
>
> // <<<<<<PATCH LINE START>>>>>>
>
> if(l_unlikely(g->gckind == KGC_GEN)){
>
> int savedgcstate = g->gcstate;
>
> enterinc(g);
>
> g->gcstate = savedgcstate;
>
> }
>
> // <<<<<<PATCH LINE END>>>>>>
>
> }
>
> else { /* emergency mode or no more finalizers */
>
> g->gcstate = GCSpause; /* finish collection */
>
> work = 0;
>
> }
>
> ...
>
>
>
> ----------------------------------------------------
>
>    This patch lets GCTM change the mode of the global state, but when it returns, fix it into the original mode. Note that we shall recover global gcstate, as enterinc function changes the state into GCSPause. Using this patch, crash1.lua and crash2.lua are well interpreted by Lua without error.
>
>
>
> ---------[Results of crash1.lua and crash2.lua after patch] ------------------
>
> root@Newbie:/home/Sandbox/temp/lua# ./lua ../crash1.lua
>
> root@Newbie:/home/Sandbox/temp/lua# ./lua ../crash2.lua
>
> [No Lua error, No address sanitizer error.]
>
>
>
> ---------------------------------------------------------------------------------------
>
>    However, as We're not an expert on garbage collection logic, this patch may have unexpected side effects. We hope someone will solve the problem properly if our patch is not suitable.
>
>
>
> ---------------------------------------------------------------------------------------
>
>    Thank you for reading. Any comments are welcomed. If you have a problem with reproducing the error or understanding analysis, feel free to comment in this thread.
>
>
>
> Found by: Jihoi Kim, Sunghun Oh, Minseok Kang, MinJoong Kim, WooSun Kang, HyungChan Kim
>
> Team Nil Armstrong
>
>
>
> -- Regards, Jihoi.
>
>