lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Fri, Aug 7, 2015 at 12:18 PM, Rena <hyperhacker@gmail.com> wrote:
>> and it's terribly inefficient: in multicore hardware, most simple
>> locks are by nature system-wide, having to propagate to _every_ core
>> in the system.  By experience, even in a high-memory-bandwidth system
>> like modern Xeon familes, there's a low maximum number of inter-core
>> messages per second.  Think like a core can do several thousand
>> operations in the same time as propagating just one lock.
>
> Ouch. It's really necessary to allocate a hardware lock every time? Not
> enough to use an atomic test-and-set instruction on a per-shared-object lock
> flag?


AFAIK, there's no such thing as a "hardware lock" in common x86 chips.
lately, i've been doing most of my inter-process synchronization via a
shared small integer.  the special case of single-producer,
single-consumer fixed-size ring buffer can be safe even without
explicit locking.

Still, when two different cores hold the same address in their
respective caches, as soon as one writes there, the other one has to
be notified to invalidate its copy.  note that the other core hasn't
even read the flag yet, just invalidated a single cache line.

Just last night, i succeeded in passing 12x10^6 packets per second
(12Mpps) from one process to another, and that's after almost a week
stuck in a very variable 3-6Mpps.  To get there, i had to ensure to
gather as many packets as possible before writing a single integer to
shared memory.  The limitation I'm hitting is just around 55,000
interprocess cache trashings per second.

That same code, in the same machine moves around 58Mpps when keeping
single core!  (but doing heavy processing on each packet makes it
worthwhile to recruit more cores, even if it's so expensive to
communicate with them)

-- 
Javier