lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


>> In my experience malloc does fail and quite often when running on a system
>> with constrained memory. I see a lot of code with unchecked mallocs on
>> github and I don't like it. If it's too much effort to check return values
>> why not just wrapper malloc/calloc so OOM failures set a message and exit
>> cleanly? Of course, it's not just open source code that has this issue.
>> I've raised many tickets for expensive enterprise software with the same
>> problem.
> 
> I've been following the development of Rust recently. One of the
> disconcerting issues I recently discovered is that the Rust standard
> libraries exit the process on malloc failure. I think that's really
> unfortunate. There are many common scenarios (e.g. network daemons) where
> recovering from malloc failure is comparatively easy and very much
> advisable, at least if you use the appropriate patterns from the beginning
> of development. It would be ludicrous to disconnect thousands of clients
> just because 1 client request triggered resource exhaustion.
> 
> I realize it's a difficult problem to solve for them. Like Lua, Rust depends
> on many hidden dynamic allocations. But Rust doesn't have exceptions, and
> the try! and unwrap patterns were only recently settled upon as best
> practice.
> 

It’s actually a *very* hard problem to solve in large blocks of C code. Typically there are many hidden allocations even within the CRT (even printf() frequently does this), and once you get one failure handling this cleanly without a cascade is exceedingly tricky. In some of our systems we shim malloc() and it’s variants, and then pre-allocate a “reserve tank” of memory. If an allocation fails we release the “reserve tank” back to the heap and retry the allocation. However, even in this case we initiate a clean shutdown, since if you are in this high-stress condition that’s pretty much all you can hope for.

In the case of Rust and large server systems, the assumption (which I think is reasonable) is that the server is running on a virtual memory OS with significant physical resources. In this case, running out of memory is a severe condition and bailing is a reasonable response. After all, why did it run out of memory? On a large system, in all probability, it’s a memory leak and there is no way to get around that EXCEPT by a restart (and, of course, trying to fix the leak).

—Tim