On Tue, Dec 11, 2012 at 12:48 PM, Sergey Zavadski
<
sergeyzavadski@gmail.com> wrote:
> The fact that crash happens only under extremely high load generated by test
> (more than 900K requests per minute) also points me that I am hitting some
> other limitation (either OS or hardware)
>
>
> On Tue, Dec 11, 2012 at 7:55 AM, James Graves <
james.c.graves.jr@gmail.com>
> wrote:
>>
>> Roberto Ierusalimschy <
roberto@inf.puc-rio.br> wrote:
>>
>> > * Because the GC traverses all objects in the system, it is one of the
>> > main "candidates" to suffer from a memory corruption created somewhere
>> > else.
>>
>> There was a recent article linked via Hacker News about one of the
>> popular web cache applications (maybe it was Varnish?). Anyway, this
>> program uses a lot of RAM, and they've been seen un-reproducable bugs
>> occasionally. The authors suspect it is a hardware problem in many
>> cases, so they've incorporated a memory test as part of their crash
>> dump. They were doing something clever with multiple XOR passes to
>> check the memory and still preserve the original data.
>>
>> At any rate, many hosting providers are not using ECC memory on their
>> servers, and as memory sizes keep getting larger and larger (and
>> process size for the memory itself smaller and smaller) the rate of
>> bit errors due to cosmic radiation keeps going up.
>>
>> The short version: also try to run a memory test on the server and see
>> if that turns up anything.
>>
>> James Graves
>>
>