[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Drawing the line between speed and simplicity/elegance
- From: Andrew Starks <andrew.starks@...>
- Date: Thu, 7 May 2015 15:28:33 -0500
On Wed, May 6, 2015 at 10:38 PM, Brigham Toskin <firstname.lastname@example.org> wrote:
> I apologize if this is kinda long; I'll try to compress. First, my
> I've been working on a stack-based language, implemented in Lua. Being a C++
> programmer, my first impulse was to wrap an array table with some ADT
> metamethods. After some futzing, my stack relatively fast, but fairly
> complicated for what it ultimately is—a LIFO.
> Recently, someone pointed out that you could do it with much less code by
> wrapping Lua's call stack in a coroutine, manipulating and returning values
> in response to external inputs. The simplicity of the design and the
> realization that the Lua devs must have implemented a much faster stack than
> I ever could lead me to explore this space. To my surprise, my first
> prototype was actually 50% slower than the original, under Lua 5.2.3. After
> thinking about what was going on and profiling several iterations, I have
> what is still a simple and (I think) very clean and elegant solution,
> utilizing a hand full of mutually-tail-recursive continuations inside a
> coroutine, and it's about 13% faster than the ADT!
> Sounds like a win, right? If I run the tests in LuaJIT (2.0.2 or 2.0.3), the
> newest prototype is even five times faster than under vanilla Lua. But, it's
> two orders of magnitude slower than the jit'ed ADT-style code. Now, this is
> still an improvement over the first prototype, which was *three* orders of
> magnitude slower than the jit'ed ADT code, but it still ain't great. I very
> strongly suspect (after looking at the -jdump) that the heavy use of
> switching coroutine contexts is foiling the compiler's ability to trace (and
> thus, optimize) the code, and I don't see a fix.
> The very specific question: Do we see a workaround, optimization, or perhaps
> an alternative implementation, which circumvents what I think is a
> limitation of how LuaJIT analyzes Lua code? I can provide github links to
> different versions of my code, if anyone thinks it will help, but I'm pretty
> sure "it's a coroutine" is a good starting place.
> The more general question: Where do we draw the line between writing simple
> code, and performance? Or phrased another way, how slow is too slow, for the
> sake of an elegant design? When I optimized the ADT code, it got uglier and
> more complex. When I optimized the coroutine prototype, it got simpler and
> more elegant.
> Brigham Toskin
I'm reading along and am interested in the answers that you get. I'm curious about your use case. Is it a scenario where *any* cycles devoted to Lua would be better devoted elsewhere?
I assume that the sort of optimizations that you need are required and not folly, so what follows is my own tale, but I don't just assume it applies, here...
When we picked Lua for our media framework, we were very worried about performance. In a way that is much higher-level than yours, I explored the same questions and coded accordingly. Then, when we integrated it with our drivers and the rest of the software, we quickly saw that it didn't matter. At all. We couldn't make Lua throw enough garbage away or do enough table look ups to compete with the run-of-the-mill latency that happens with threading or when working with media files.