Don't forget stack reallocation/growth.
Ah, I see, luaM_realloc_() writes to G(L)->GCdebt, so it would need a lock or something as well. I'd missed that, thanks. (Maybe just converting GCdebt to an atomic ptrdiff_t would about solve this one -- that's probably cleaner than messing around with the implied nested locks you'd get from the interactions with luaC_newobj.)
luaY_parser / load / require
could also be fun…
Yeah, they shouldn't come up in my hypothetical use cases, but, they do make me nervous. Apart from the implied luaC_newobj / luaM_realloc / luaS_newlstr calls, I haven't actually found any places where I can see that they'd need a lock, but, I haven't really looked that hard either. Might be wise to just disallow them in my parallel evaluation mode, and save myself the worry.
I'm feeling increasingly confident that the big issue for any parallel coroutine execution scheme is table writes. And I'm starting to think that maybe, in practice, the most sensible way of handling the issue might be to just say that while I'm doing parallel executions, any writes into tables are just errors. That's pretty darn restrictive, but, I think I might have some use cases where the ability to run even such heavily restricted lua code in parallel would be useful, and my initial tests are suggesting that I might get as much as a 4X speedup for my trouble, which is big enough to be tempting. (I could even write a thread-safe table-like userdata class to use in those cases where I really do need to write into something table-ish during a parallel execution. But adding locking logic around all table accesses feels like it would add too much bloat to an otherwise pretty lean interpreter.)