[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: likely regressions for debugging hooks x lua_resume() in 5.4
- From: Aurelien DARRAGON <adarragon@...>
- Date: Thu, 7 Sep 2023 22:25:46 +0200
> Hi list!
>
> While playing with lua debug hooks in 5.4, I observed 2 strange
> behaviors that didn't exist in 5.3.
>
> I tried to isolate both issues and got 2 reproducers that should
> hopefully help to understand them.
[...]
> How should we do to properly yield from a hook function and allow the
> following resume() call to properly finish the lua execution, like if
> nothing happened (from lua's script point of view)?
[...]
Ok so for your interest, I was able to circumvent the second issue by
fixing the code making use of the Lua API, since I'm now convinced that
it was misusing lua_resume() in the first place.
Indeed, on the initial lua_resume() call, the nargs parameter is
still set to the number of arguments pushed on the stack prior to
calling it. But then, once a yield has been performed, I call
lua_resume() with nargs being 0 this time (because I didn't push new
arguments on the stack), considering that the pushed arguments from the
initial call were already taken into account.
Here is the "patched" version of the second reproducer named
"lua_missing_stack_arguments.c" that I linked in my previous email:
> ***************
> *** 21,22 ****
> --- 21,23 ----
> int ret, nres;
> + int nargs = 1;
>
> ***************
> *** 26,28 ****
> #if defined(LUA_VERSION_NUM) && LUA_VERSION_NUM >= 504
> ! ret = lua_resume(L, NULL, 1, &nres);
> #else
> --- 27,29 ----
> #if defined(LUA_VERSION_NUM) && LUA_VERSION_NUM >= 504
> ! ret = lua_resume(L, NULL, nargs, &nres);
> #else
> ***************
> *** 35,36 ****
> --- 36,38 ----
> case LUA_YIELD:
> + nargs = 0;
> goto resume;
Doing this way seems to make lua_resume() happy and doesn't not conflict
with the improvements from
https://github.com/lua/lua/commit/58aa09a0b91cf81779d6710d7f9d855bb9d3712f
Maybe this corner case should be documented in the Lua documentation
(after a yield nbargs should be set according to the number of pushed
arguments AFTER the yield prior to calling resume() again?) given that
this is something that "used to work by accident", or at least to not
fail in 5.3 and now results in somewhat unexpected behavior in 5.4, and
could lead to tricky bugs when upgrading from 5.3 from 5.4 with existing
scripts.
For a bit of history, this is what we originally faced at haproxy when
thoroughly testing the 5.4 stack with existing implementation and
scripts: we would sometime get unexpected lua errors in the middle of
script execution when resuming after a yield triggered from a debugging
hook after X instructions, which is something we heavily depend on
within haproxy lua wrapper to make sure we don't spend too much time in
a single lua_resume() call and to be able to interrupt the call, do some
other stuff, and then resume the script execution a bit later.
> In 5.3's lua_resume() implementation, we have an explicit nCcalls
> decrement plus and extra assert at the end of the function, so I was
> wondering if this could've been simply overlooked in 5.4?
> Thus, I tried adding the 'nCcalls--' instruction that I thought was
> missing at the end of the lua_resume() function, and it seemed to have
> fixed the "C stack overflow" errors I was getting.
> But I can't be 100% sure that this is the correct way to fix it since
> not decrementing nCcalls here could've been intentional in the first place?
Now for the first raised issue ("C stack overflow" errors), I still
believe that this looks like a bug or undesired regression in lua 5.4
implementation, because interrupting and resuming multiple times (in a
loop manner, no recursion involved) from the same parent coroutine seems
to be a pretty valid use-case, is it?
Regarding the provided reproducer named "lua_c_stack_overflow.c", it
doesn't have to be yield triggered by debug hooks, the same can be
achieved by directly yielding from the script (ie: manual yield each OP
in a loop, "C stack overflow" will be triggered after LUAI_MAXCCALLS
yields if the same <from> is used for each lua_resume() calls)
Best regards,
Aurelien