lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I found the problem. It is a change in 5.2 that causes our
multi-threaded system to fail.

I recently moved from 5.1 to 5.2 and this is when I noticed the
problem. In Lua 5.1, the nCcalls variable was a member of lua_State.
In 5.2, you have moved the variable to a global state.

I have a system where each coroutine is mapped to a native C thread.
Only one C thread is allowed to execute in the Lua VM at any time.
This is controlled by a global mutex. However, Lua code can call a C
function that releases this mutex while the C function executes. This
C function may block for various reasons, leaving the C stack and the
Lua coroutine stack frozen until the function returns. In the
meantime, another event may occur and  another "C thread/coroutine"
combination may run. This worked great in 5.1 since each Lua coroutine
state kept all data for the call in lua_State. I have thousands of
runtime hours in 5.1 that confirmed that our threading construction
worked. However, this is no longer the case (at least not with
nCcalls) since the now global nCcalls variable is wrong from one state
to another when using multiple native threads that can suspend.

For example, when luaD_rawrunprotected is called, you do:
unsigned short oldnCcalls = G(L)->nCcalls;
...... In our system, Lua code executing may call a C function that
temporarily releases the mutex here and suspends
G(L)->nCcalls = oldnCcalls;

This construction fails since multiple threads blocking in calls make
" G(L)->nCcalls = oldnCcalls" fail i.e. the oldnCcalls is no longer
valid for a global state.

For example, a native thread executes, luaD_rawrunprotected is called,
nCcalls is incremented, and the script is run. This script calls a C
function that releases the mutex and suspends the native thread and
coroutine stack. Now another coroutine/thread combination executes and
saves nCcalls in oldnCcalls. The global nCcalls variable was
incremented by at least one by the first thread, thus oldnCcalls is
incorrectly saved by the second thread. This would not have been a
problem if each Lua state had its own nCcalls variable, such as in Lua
5.1

Please consider moving nCcalls back into lua_State. I think the only
way for me to continue using 5.2 right now is to disable the checking
of the nCcalls variable.

I have only analyzed the use of nCcalls in the new global structure
"global_State". I am not sure if any of the other variables in the new
global structure could cause other problems in a native thread mapped
coroutine system.

-Will