lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


> I am afraid I was wrong (or not exactly right...). In case of errors the
> nCcalls count may become wrong. (A pcall saves nCcalls and restores it
> to its original valua in case of errors; counts for call/return in other
> threads between the pcall and the error will be lost.)
>

Roberto, I was formulating a response when I got the above. I'll
include my response anyway since I have some suggestions:


> > That will simply postpone the error. Our system is designed to run for
> > years and the ceiling will eventually be hit.
>
> Only if you have an unlimited number of suspended threads, or if you are
> killing suspended Lua threads with unreturned calls. If all threads
> eventually terminate, the nCcall count will be eventually decremented.

Roberto, this is simply not correct, please look carefully at the
following sequence:

System boot: nCcalls is initially 0

T1 -> luaD_rawrunprotected -> save 0 in oldnCcalls and increment
nCcalls -> Lua Script -> C function that suspends

T2 -> luaD_rawrunprotected -> save 1 in oldnCcalls and increment
nCcalls -> Lua Script -> C function that suspends

T1 resumes -> luaD_rawrunprotected restores nCcalls to 0

T2 resumes -> luaD_rawrunprotected restores nCcalls to 1

In other words, using multiple threads that can suspend as explained
above makes nCcalls increment at least by one for each sequence
similar to above. The problem is that the second thread (T2) saves
nCcalls on the C stack with a value that was initially incremented by
the first thread T1 (the first thread is still suspended). When T1
resumes, the nCcalls variable is correctly restored, but when T2
resumes it is restored to a value that is incorrect.

Since the system keeps running, and the above threading condition
occur, the global nCcalls variable keeps increasing until the system
fails.

FYI, we have an embedded system, where all threads are created at boot
time. These threads never terminate.


> > What is the logic behind this statement?
>
> See the reported bug. If all coroutines share a single thread (the
> default in Lua), nested coroutines would overflow its stack.

OK, so a bunch of wrongly used coroutines may overflow the C stack.
The word 'may' is not as fatal as 'will' i.e. using multiple threads
'will' make the global nCcalls fail.

I recommend that you make nCcalls a compile time option If you think
it is important to protect the unlikely case where wrongly used
coroutines may overflow the C stack. In other words, luaconf.h should
include options for having the nCcalls in the global state or in
lua_State. This should be very easy since you are already using macros
for getting to the global state when incrementing and decrementing
nCcalls.


> > That will simply postpone the error. Our system is designed to run for
> > years and the ceiling will eventually be hit.
>
> Only if you have an unlimited number of suspended threads, or if you are
> killing suspended Lua threads with unreturned calls. If all threads
> eventually terminate, the nCcall count will be eventually decremented.

Yes, the above would have caused the error, but as explained in my
analysis above, this error also occurs with only two threads that
never terminate.

>  But please understand that corotines (and Lua in
> general) were designed to work with single threads.

Yes, I understand, but mapping coroutines to threads are extremely
useful and makes certain types of design much easier than always
having to yield a corotine (asynchronous mode) v.s. suspending a
thread/coroutine (blocking mode). Most embedded systems include an
RTOS designed for threading different than a regular operating system.
For example, the RTOS may be designed such that preemption is not
possible with threads on the same priority level unless the thread
specifically yields. Threading in an RTOS are not always for
multitasking, it is also used to simplify design since a thread is an
execution path with its own state (stack).

-Will