|
It was thus said that the Great Coroutines once stated:
> On Fri, Apr 4, 2014 at 5:19 PM, Sean Conner <sean@conman.org> wrote:
>
> > It was thus said that the Great Coroutines once stated:
> > >
> > > I do agree that separate global_States in separate threads is the
> > > safer/saner way to go. The issue I have with that is swallowing the cost
> > > of marshalling/serialization. I wish there were a lua_xmove() for moving
> > > objects between global_States, so you could make all "Lua instances"
> > > visible within a shared memory space, and swap objects directly between
> > > them.
> >
> > I don't know. I want to say that if you want to move an arbitrary object
> > between threads you are doing it wrong, but I'm not sure what you are
> > trying
> > to do, so I won't say that 8-P
> >
> > In general, it's a difficult, if not outright, impossible task. Tables
> > are difficult, and I suspect userdata is all but impossible to handle
> > correctly with a "lua_gsxmove()" function.
> >
> > And as for the cost of marshalling/serialization, remember, the QNX X
> > server I'm talking about did all that, and *still* was faster than a shared
> > memory version of the server.
> >
> > -spc
> >
> >
> You could make lua_States from separate processes visible to each other
> with shared memory, but what's on the stack is most likely a reference if
> the object isn't something like a number. You could move these references
> between lua_States of different processes but the data wouldn't be moved
> from one global_State to the other. This is why marshalling is the
> safest/slowest way right now :(
How do you know that marshalling is the slowest way? You are making that
assumption (QNX X server marshalls, and it's faster than using shared
memory). It may also be that you are trying to share too much thus causing
cache contention [1].
-spc (The only way to be sure is to measure ... )
[1] A novel new approach to spinlocks:
http://lwn.net/Articles/590243/
It uses way more memory than a traditional spinlock (something like
2*number-cpus) but in practical real-world tests [2] it was 100%
faster, *because* it reduced cache contention.
[2] A particular type of benchmark *cough cough*.