Looks like you're talking about me...
On 07/05/15 12:38 AM, Brigham Toskin wrote:
I apologize if this is kinda long; I'll try to compress. First, my situation:
I've been working on a stack-based language, implemented in Lua. Being a C++ programmer, my first impulse was to wrap an array table with some ADT metamethods. After some futzing, my stack relatively fast, but fairly complicated for what it ultimately is—a LIFO.
Recently, someone pointed out that you could do it with much less code by wrapping Lua's call stack in a coroutine, manipulating and returning values in response to external inputs. The simplicity of the design and the realization that the Lua devs must have implemented a much faster stack than I ever could lead me to explore this space. To my surprise, my first prototype was actually 50% slower than the original, under Lua 5.2.3. After thinking about what was going on and profiling several iterations, I have what is still a simple and (I think) very clean and elegant solution, utilizing a hand full of mutually-tail-recursive continuations inside a coroutine, and it's about 13% faster than the ADT!
Sounds like a win, right? If I run the tests in LuaJIT (2.0.2 or 2.0.3), the newest prototype is even five times faster than under vanilla Lua. But, it's two orders of magnitude slower than the jit'ed ADT-style code. Now, this is still an improvement over the first prototype, which was *three* orders of magnitude slower than the jit'ed ADT code, but it still ain't great. I very strongly suspect (after looking at the -jdump) that the heavy use of switching coroutine contexts is foiling the compiler's ability to trace (and thus, optimize) the code, and I don't see a fix.
The very specific question: Do we see a workaround, optimization, or perhaps an alternative implementation, which circumvents what I think is a limitation of how LuaJIT analyzes Lua code? I can provide github links to different versions of my code, if anyone thinks it will help, but I'm pretty sure "it's a coroutine" is a good starting place.
The more general question: Where do we draw the line between writing simple code, and performance? Or phrased another way, how slow is too slow, for the sake of an elegant design? When I optimized the ADT code, it got uglier and more complex. When I optimized the coroutine prototype, it got simpler and more elegant.
First of all you should AVOID AT ALL COSTS using coroutines in Lua(JIT): they're slow. As can be seen here, I don't use them, so you too can avoid them.
Second, there's no arguing against KISS. This is how I call Lua functions. To make a wrapper around a Lua function you just `function(word, i, arg1, arg2, arg3, etc, ...) local v1, v2, v3 = f(arg1, arg2, arg3, etc) return word, i, v1, v2, v3, ... end`, and that's it!
Third... avoid `select(var, ...)`, LuaJIT doesn't like it.
Disclaimer: these emails are public and can be accessed from <TODO: get a non-DHCP IP and put it here>. If you do not agree with this, DO NOT REPLY.