I've been using Lua for quite some time - and it's a hugely impressive
language especially in embedded firmware :-)
I have a problem that appears to be some kind of garbage collection (?)
issue. The situation is as follows:
- There is one "Lua Universe" from which multiple running Lua
states are produced by calling lua_newthread.
- I have applied LuaUsers extensions, according to
http://lua-users.org/wiki/ThreadsTutorial
- The LuaLock and LuaUnlock functions make use of a global mutex
so that only one OS thread can be running Lua at any one time
- There are about 12 different Lua states, all running in their own
OS thread.
- Each thread running C++ invokes Lua through lua_pcall
- There is quite a large amount of C/C++ extensions added into Lua
that are called from within each Lua context
- This is probably just a side detail and not relevant.
- Every now and then the asserts fail in lvm.c's luaV_execute
- Typically, base no longer is equal to L->base (but
L->ci->base is the same as L->base)
- It's does not appear to be down to buffer overruns (I wrapped
base in a protected set of variables to look for changes). If anything,
it is L->base that is changing (not base itself).
- Sometimes, base points to stale memory (I have full MISRA C
runtime checking enabled in my environment) - as if it was valid, but
is not anymore.
- Eventually Lua goes stale -
- Sometimes an OS thread crashes (though I can't prove it's the
same issue at the moment)
- Sometimes a while..forever inside a Lua will go AWOL - and
variables will be pointing to the wrong place (as if the stack has
changed unexpectedly)
After a couple of days of furious debugging, I find the following:
- I have put numerous checks within lvm.c's luaV_execute to narrow
down the point where base is modified
- It appears that the modification occurs either side of dojump's
luai_threadyield
- I modified the dojump macro to check the validity of base and
throw an exception if wrong
- What I can see is that before the threadyield base was fine,
and after it has changed.
- As far as luaV_execute is concerned, base must not change
(or when L->base is expected to change, then the call is wrapped in
Protect)
- I have occasions where L->base has changed in an OP_TEST
opcode (where L->base was definitely not expected to change!)
- The value of L->base has changed in other opcodes too (not
just OP_TEST)
I'm trying to pin down where this "corruption" is occurring, and
wondered if it might be down to garbage collection or some other
'shuffling' process. It would appear that other parts of Lua may be
considering the lua_State to be 'unlocked' and therefore safe to modify
- when in fact they are not (because it is running within the context
of luaV_execute).
I am hoping to craft some debug code that can pin down where the change
is occurring, but I'd be grateful for any pointers or suggestions
otherwise!!!
Thanks :-)
Matt "Matic"
|