[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Scalability of Lua, many small vs. few big lua_States?
- From: Tim Hill <drtimhill@...>
- Date: Tue, 28 Jan 2014 10:13:39 -0800
On Jan 27, 2014, at 4:31 PM, Eric Wing <ewmailing@gmail.com> wrote:
> On 1/27/14, Tim Hill <drtimhill@gmail.com> wrote:
>>
>> This is a little tricky to answer definitively. For our stress testing of
>> Lua we have run up to 10,000 states with no real problems, so nothing will
>> actually break using your current fine grained model. However, if the
>> computation done by each state is very simple then the amount of time the OS
>> spends switching threads may become a significant percentage of total CPU
>> time (perhaps even 50%). In this case, re-factoring the code as you suggest
>> (fewer states) might actually increase overall throughput (or, manage the
>> same throughput with lighter load on the CPU etc).
>>
>> One advantage of Lua when used in this way is that, since the core VM is so
>> small, pretty much all the VM can fit in the L1/L2 caches of an x86 class
>> CPU, meaning the performance of running VM instructions starts to get *very*
>> fast. And if all your threads are sharing the same VM code (which they will
>> be if they are all running in the same OS process), then you won’t get a
>> cache warming hit when switching threads. The net-net is you should get good
>> performance either way, with a slight boost if you go for “fewer, bigger”
>> states.
>>
>> —Tim
>>
>
> Great info, Tim. Might I add/suggest though that "fewer, bigger" is a
> false dilemma for getting around the context switching bottleneck? It
> seems to me that the real problem is running more threads than
> cores/CPUs. If you have a scheduler, preferably one that is aware of
> current hardware availability (thinking of Apple's Grand Central
> Dispatch as one example), then you avoid overwhelming your hardware
> resources and you don't necessarily have to redesign/refactor code to
> make "fewer, bigger" Lua states.
>
> Also, if anybody finds this data point useful, I measured a single
> baseline Lua state to take about 4-5KB of RAM on a 64-bit Mac.
>
> -Eric
Agreed, if you can take advantage of stuff like GCD (a great technology IMHO, btw), then things work well. I’ve not thought about how to wire Lua into GCD … my first thoughts is that it might be tricky as the Lua execution model is not interruptible (it can only co-operatively multi-task).
As a point of reference, in my most recent work we use quite a few independent Lua states (dozens to hundreds), but the code in each state is designed to be event driven and we use a dispatcher that schedules the events across a thread pool based upon available CPU cores .. kind of a “GCD lite” but cross-platform.
—Tim