lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Fri, May 01, 2015 at 12:45:50PM -0700, Tim Hill wrote:
> 
> >> In my experience malloc does fail and quite often when running on a system
> >> with constrained memory. I see a lot of code with unchecked mallocs on
> >> github and I don't like it. If it's too much effort to check return values
> >> why not just wrapper malloc/calloc so OOM failures set a message and exit
> >> cleanly? Of course, it's not just open source code that has this issue.
> >> I've raised many tickets for expensive enterprise software with the same
> >> problem.
> > 
> > I've been following the development of Rust recently. One of the
> > disconcerting issues I recently discovered is that the Rust standard
> > libraries exit the process on malloc failure. I think that's really
> > unfortunate. There are many common scenarios (e.g. network daemons) where
> > recovering from malloc failure is comparatively easy and very much
> > advisable, at least if you use the appropriate patterns from the beginning
> > of development. It would be ludicrous to disconnect thousands of clients
> > just because 1 client request triggered resource exhaustion.
> > 
> > I realize it's a difficult problem to solve for them. Like Lua, Rust depends
> > on many hidden dynamic allocations. But Rust doesn't have exceptions, and
> > the try! and unwrap patterns were only recently settled upon as best
> > practice.
> > 
> 
> It???s actually a *very* hard problem to solve in large blocks of C code.
> Typically there are many hidden allocations even within the CRT (even
> printf() frequently does this) and once you get one failure handling this
> cleanly without a cascade is exceedingly tricky.

Yes, printf has many different points of failure internally. But from the
caller's perspective you only need to check a single return value. (I would
also note that glibc is unique in that the underlying formatting code
heavily uses dynamic allocation. Most implementations only use dynamic
allocation for formatting doubles, or not at all. Which means of all the
errors printf handles, OOM is least problematic. As it can be in many cases.
When you try to handle memory failure correctly rather than disregarding it,
you tend to make decisions which ease the burden and which often improve
overall code quality. And in any event, glibc printf and snprintf does
handle OOM, because as a library interface that's expected and should be
expected!)

IME, if you follow RAII principles, it's not hard (difficult, complex) at
all. Rather, it's _tedious_. But those are entirely separate qualities.
Tedium is part-and-parcel of writing correct C code. Avoiding the necessary
tedium when working in C is one reason to integrate tools like Lua, where
the language constructs strike a different balance, especially wrt to
resource management.

RAII isn't just a set of stylistic idioms for C++. It's about aggregating
resource acquisition into a smaller number of localized code blocks. That
reduces the burden of error checking, and ensures that there are fewer
places where your program can be an inconsistent state. The benefit is
compounded when you adhere to the principle at all layers, so that the paths
where errors bubble up through call chains are also reduced.

You can apply the principle in any language, including C, and whether or not
your code is heavily object oriented.

> In some of our systems we shim malloc() and it???s variants, and then
> pre-allocate a ???reserve tank??? of memory. If an allocation fails we
> release the ???reserve tank??? back to the heap and retry the allocation.
> However, even in this case we initiate a clean shutdown, since if you are
> in this high-stress condition that???s pretty much all you can hope for.
> 
> In the case of Rust and large server systems, the assumption (which I
> think is reasonable) is that the server is running on a virtual memory OS
> with significant physical resources. In this case, running out of memory
> is a severe condition and bailing is a reasonable response. After all, why
> did it run out of memory? On a large system, in all probability, it???s a
> memory leak and there is no way to get around that EXCEPT by a restart
> (and, of course, trying to fix the leak).

IME those are not very good assumptions. Memory leaks are not very high on
my list of culprits when it comes to resource exhaustion, including memory
exhaustion. If you rigorously adhere to disciplines like single owner (a
foundational assumption of Rust, BTW), then you avoid the kind of complex
code that favors memory leaks. Copy-by-value and immutability are similar
strategies (the latter also heavily adopted by Rust).

Your assumptions may very well hold true in your environments. I just think
it's wrong to assume that the environments you work in are as common as you
think, even _if_ they're the most common universally. (Which, for the sake
of argument, I'll grant you.)