lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Jerome Vuarand wrote:
2009/12/1 Javier Guerra <javier@guerrag.com>:
  
On Tue, Dec 1, 2009 at 10:48 AM, Matt 'Matic' (Lua) <lua@photon.me.uk> wrote:
    
There is quite a large amount of C/C++ extensions added into Lua that are
called from within each Lua context

This is probably just a side detail and not relevant.
      
does any of these extensions touch a Lua_State other than the one that
called it?  remember that the Lua core calls LuaUnlock just before
executing a C extension, so some other state will be running
concurrently with your C code; if you modify it, it will be corrupted.
    
I had similar issues. If several threads use a given coroutine, Lua C
API calls from different C extensions may be intermixed, and while the
interpreter state itself is safe, the content of the stack may become
wrong. Many luaL_ functions become unsafe in that regard since they
may unlock the state in the middle of processing.

To solve the problem I decided to expose lua_lock and lua_unlock from
the Lua API (I have a patch available somewhere), so that I can call
it around all C code supposed to access shared Lua states. This
implies that lua_lock implementation works recursively, but that's
quite easy to do with most threading APIs.

Another solution not involving patching Lua is to make sure that no
two native threads use the same Lua thread (coroutine). This is why
most threading libraries for Lua create at least one Lua coroutine per
native thread created (eg. LuaThread, LuaProc).

  
As you both have said, it would on the surface appear as though I had two threads accessing one lua_State, or some other C/C++ corruption of Lua. However, I've put a huge amount of code to prove that wasn't the case (just doubting myself!!). Everything appears correct and coherent.

My view is that the Lua VM is making assumptions about the L->base value. Generally, L->base doesn't change, so it holds a local copy to avoid indirection and speed up the LVM. In some opcodes, the VM knows that L->base could or is definitely going to change, so the call is wrapped up in the "Protect" macro which reassigned the local copy of base after completion.

However, it would appear that there are 6 (IIRC) op codes that call dojump outside of the context of "Protect" and assume that L->base is not going to change. If you have one Lua "Universe", one lua_State per OS thread then that will be ok.
Once you add OS-level threading and have a single Lua "Universe" - even if you use lua_newthread and strictly keep each lua_State in an OS thread - that assumption is no longer valid.

Consequently, I have changed the "dojump" macro in lvm.c to now be:

    #define dojump(L,pc,i) { (pc) += (i); luai_threadyield(); base = L->base; }

Now, I know that some "dojump" calls are also wrapped inside "Protect" and therefore the "base = L->base" is going to be duplicated in those cases. Of course, my optimising compiler removes the redundancy.

Guess what - the problem has gone away and Lua is not failing its assertions anymore (and my Lua code isn't running off the rails)!!


Javier - I reckon you adding the extra lock/unlock inside your C routines is probably greatly minimising the issue because you are reducing its probability, but you may well find it's the same one that I have and can be resolved completely with the dojump patch.


Any thoughts or comments??


Matt