Algorithm help: passing strings between threads with minimal copying

This isn't really specific to Lua, but since I'm doing it in Lua, and people on this list seem to know these things pretty well, I'll ask here.

I'm implementing a system which allows a Lua script to create a new OS thread. That thread creates its own, independent Lua script and runs some code given to it at creation time. The two threads need to be able to communicate with eachother. The new thread can create more new threads as well. This is a module that needs to work under standard Lua, so unfortunately I can't take advantage of lua_lock and lua_unlock macros.

What I'm having trouble with is passing messages between threads. The interface is simple enough: when you have a reference to a thread (because you created it or it was given to you as a parameter), you can send a message (a string) to it. The message goes into a queue and the thread can call a function to retrieve the next message from the queue.

Of course one thread can't touch another thread's Lua state, since a state can only be used by one thread at a time. So we need to store the received messages somewhere until the receiving thread asks for them. No problem, we have a struct for each thread already (holding its lua_State* and other nice things); we can copy the string into a list in that struct, using mutexes to ensure it's not being read and written at the same time.

The issue with that design, though, is excess copying. If thread A sends a message to thread B, that means A has a copy of the string, B's incoming message queue will have another copy, and B's Lua state will make a third copy when lua_pushlstring() is used to receive the message. I'd quite like to eliminate that extra copy.

Alright, so we don't copy the message into B's queue; we give it a pointer to the string in A's Lua state, and anchor it in the registry to keep it from being collected before B receives the message. Now B copies directly from A, no third copy, no problem?

Well, two problems actually. Those strings have to be removed from A's registry sometime so they aren't wasting space, and if A exits before B checks its messages, the string is still going to get collected and the pointer will be invalid.

So we'll have B notify A in some fashion when it receives a message, so that A can allow the message to be collected. Maybe we'll keep the ID from luaL_ref in the message struct as well, and then B can add that ID to a list in A's thread struct (the same way A adds the message to B's message queue) telling it that the message is now safe to collect. Then A can check that list periodically to collect messages, and can wait before exiting to ensure all messages are received.

But, what if B never checks its messages? Perhaps B is in a loop waiting for some other event that never comes? Then A will sit waiting forever before it's allowed to exit, waiting for those sent messages to become collectible, which will never happen.

The two solutions to this I came up with are:

1) When A exits, it can copy all unreceived messages into malloc()'d buffers and update the pointers to them in B's message queue (along with a flag telling it to free() them itself). Thus there's a third copy like in the original design, but only in the "worst case scenario" when A exits before B receives its message. However I found this complicated the design considerably, as now when A exits (or collects a reference to B) it has to look through B's message queue and copy any messages that came from A - which means the message has to identify which thread it came from - which means now the messages have to have pointers back to their senders, which could become invalid suddenly, and becomes difficult to keep track of.

2) Ensure B checks its messages periodically, by using a hook[1] which is called every so often. In the hook, B copies all received messages into its registry (so that A can then collect them), and when the script actually asks for messages, it retrieves them from there (and looks at the queue if the registry doesn't have any messages). This seems like it'd be effective except in the "worst case" when B gets stuck in an endless loop and the hook never gets called, which could potentially cause a cascading failure (A never exits because it's waiting for B which never calls its hook).

I feel like the second solution is ideal (I already have such a hook in place for other reasons, so overhead isn't a huge issue), but I wondered if anyone else had any feedback on this.

[1] The method I use isn't really a "hook" (as in debug hook). Rather, when creating the Lua state I leave an object on the stack (a 1-byte userdata) with a __gc metamethod attached to it. Since the object is left on the stack but not reference anywhere, it will be collected. Its __gc metamethod performs whatever periodic tasks need doing, and then leaves another such object with the same metamethod on the stack, so that the metamethod will fire again on the next GC cycle. (It has to create a new object, since each object's __gc is called only once.) This way the "hook" executes periodically as long as the garbage collector is running.

--
Sent from my Game Boy.