I've run into an interesting problem while hunting down a bug in our code that is using Lua 5.1.4 and toLua 5.1.3. It took a lot of hackery and printfs to finally figure out what was going on, so here it is in a nutshell.
We have a state machine builder object SMBuilder with a function called AddState that allocates and stores a new state in an internal list. This class and method are exposed to Lua using toLua like so:
SMBuilder(const char* tableName);
SMBuilderState* AddState(const char* pName, int stateEnum = -1);
void AddIntEnterMessage(int msg, uint32_t data);
static SMBuilder *CreateSMBuilder(const char *pTableName);
static uint32_t ConvertEnumToUint32(int input);
The relevant code that is generated looks like this (clipped error handling for compactness):
/* method: AddState of class SMBuilder */
static int tolua_LuaSMBuilder_SMBuilder_AddState00(lua_State* tolua_S)
SMBuilder* self = (SMBuilder*) tolua_tousertype(tolua_S,1,0);
const char* pName = ((const char*) tolua_tostring(tolua_S,2,0));
int stateEnum = ((int) tolua_tonumber(tolua_S,3,-1));
SMBuilderState* tolua_ret = (SMBuilderState*) self->AddState(pName,stateEnum);
/* method: ConvertEnumToUint32 of class LuaSMBuilderManager */
static int tolua_LuaSMBuilder_LuaSMBuilderManager_ConvertEnumToUint3200(lua_State* tolua_S)
int input = ((int) tolua_tonumber(tolua_S,2,0));
uint32_t tolua_ret = LuaSMBuilderManager::ConvertEnumToUint32(input);
void* tolua_obj = new uint32_t(tolua_ret);
/* finalizer for uint32_t userdata */
static int tolua_collect_uint32_t (lua_State* tolua_S)
uint32_t* self = (uint32_t*) tolua_tousertype(tolua_S,1,0);
And finally, our Lua code might look similar to this:
local function Foo()
local val = LuaSMBuilderManager:ConvertEnumToUint32(0);
builder = LuaSMBuilderManager:CreateSMBuilder("state machine");
state = builder:AddState("state_x");
state:AddIntEnterMessage( LuaSMBuilderManager:MSG_SOMETHING, val );
The problem we were seeing was that in the second call to Foo(), it was possible that the value of 'val' in the AddIntEnterMessage was garbage. I was able to determine that the contents of userdata created in ConvertEnumToUint32 had in fact been deleted before 'val' lost scope (this was very intermittant by the way). Why? Here is what I've determined:
The first pass through Foo, SMBuilder would allocate a new SMBuilderState at address X. ToLua's generated code created a new usertype, putting the userdata block in its internal tolua_ubox table with address X as the key. Once builder:delete is called, that address X is invalid, but the toLua generated code does not release the SMBuilderState usertype, it remains in tolua_ubox (uh oh!).
The second pass through Foo, toLua generated code ConvertEnumToUint32 allocates a new uint32_t at address X (that's fine, that block was freed when SMBuilder was deleted in the first pass). toLua then pushes a new usertype onto the stack, again using the address X as a key in the tolua_ubox table, but wait, there is already an entry under that key! Also, toLua specifies a special finalizer for this uint32_t using the tolua_clone function. This function is added to a table (tolua_gc) with the address X as a key. Here's where the fun begins!
Before AddIntEnterMessage is called, a garbage collect pass is triggered. It finds that old user data that was used to store the SMBuilderState pointer and with that address proceeds to call the toLua function class_gc_event which looks up the finalizer using the address as a key into tolua_gc. tolua_collect_uint32_t is called which proceeds to delete the uint32_t at address X without any idea that the userdata was for something completely different. If something else writes over that data (which in our case does happen) then AddIntEnterMessage chokes.
I think the correct solution to this is to change our pattern. I've googled for answers and searched this mailing list's archives and haven't found the right way to do what we're trying to do which is create an object (SMBuilderState) with another object (SMBuilder) that is correctly destroyed (via tolua_release w/o calling delete) when it goes out of scope. I can't simply add a destructor to SMBuilderState and call :delete because SMBuilder still has pointers to that memory.
Is there a common pattern to follow for this type of setup? I appreciate any insight anyone can offer.