lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 9/29/2017 4:27 PM, Dibyendu Majumdar wrote:
Hi Kein-Hong ,

On 29 September 2017 at 04:32, KHMan wrote:
On 9/29/2017 5:37 AM, Dibyendu Majumdar wrote:

[snipped all]
However, looking at the generated C code made me think ... perhaps it
is worth trying to write a hand-coded JIT compiler. The thing about
the generated code is that there is not a lot of stack usage


I am a little puzzled here: "there is not a lot of stack usage". (I only
glanced at the C example you posted not long ago, and I have never looked
into Ravi in detail.) If (via [1]) the usual value stack is used for normal
Lua code, then it's true there is not a lot of C stack usage. But the really
big wins happen when we bypass the Lua value stack.


I think using the C stack for Lua code execution is quite hard to do
... as anytime an operation can call something then the values need to
be flushed to Lua stack, and read back after the call. I am not sure
whether the cost of this would be justified except for functions that
have no calls (unlikely). Note that calls here mean anything going out
of the VM loop - such as table operations or metamethods, and not just
actual function calls.

Agree, when it comes to implementing full Lua functionality there is no easy way of making things super fast.

Looking at [2], "Ravi Int" (interpreted?) has practically the same
performance as Lua 5.3.2.

Ravi's interpreter performance for standard Lua code is slightly worse
than Lua. This is I think due to a) larger VM, b) additional branching
as Ravi has 2 additional table sub types. When using type annotations,
depending upon the benchmark, the interpreter does better than Lua,
but the difference is not that great. The improvement only becomes
greater when the type annotated code is JITed.

For fornum_test1.lua, the 0.309 looks weird -- I don't quite believe it. Was
the "j = i" optimized away by either LLVM or dynamically by the CPU? Should
be investigated. If so, it's not valid as a benchmark test, or you should
add a footnote.


I think the loop is still being executed. libgccjit produces a number
like 0.001 which is because it eliminates the loop entirely. But then
libgccjit performs worse in other cases.

Thanks for the clarification, in which case I would/should withdraw my doubts. Given the minimal nature of fornum_test1.lua and fornum_test3.lua, I would love to see them compared to a C baseline, assuming a compiler can produce proper code, say add volatile to force an actual store to memory? Again, a problem is this benchmark would be extremely sensitive to certain things.

Short benchmarks can be a problem, it amplifies certain behaviour greatly, good if one is properly targeting certain things. But if libgccjit eliminates the loop, it wouldn't be the same amount of effort done. Having effort benchmarks and optimization benchmarks might be better in such cases.

Same with fornum_test3.lua, you have a "j = k" there, same potential
problem. The disparity between the 4.748 of Ravi(LLVM) and the 16.74 of
LuaJIT2.1 Int is suspicious. LuaJIT is very good and Ravi(LLVM) wipes it
out? Should be investigated. If something is optimized away entirely, then
it is an unfair benchmark.

Here LuaJIT is suffering from the lack of predictability in branching
I think. I have not investigated but I suspect it isn't using JIT due
to this.

Thanks for the clarification. Without data I was just shooting in the dark :-) Sorry about the shooting.

At that point, I am quite happy to stick to standard Lua plus C libraries,
that's why I never started on any serious JIT stuff, ha ha. :-)

I agree and that is my conclusion as well as I posted earlier this
year. I have rearranged my own code so that scripting is used to
configure - but the performance sensitive parts are written in C/C++.
I now think that Lua's performance is adequate for most use cases. And
given Python's immense popularity, it is clear that performance in
scripting languages isn't the only criteria for success.

So at this point my efforts in Ravi are more for fun and learning.

I for one greatly appreciate the experimentation efforts, I'm sure many of us feel the same. Always good to have some data to compare Lua, LuaJIT and Ravi, etc. The other path taken by some folks here and on other lists is to have a scripting bit (Lua) and a compiling bit (tcc or other), much like how one programs a GPU.

JavaScript has had huge amounts of resources thrown at it, and even then the very fast stuff like asm.js add constraints. And with WebAssembly static typing is back in business. Oh wait, maybe it's the core of Java, modernized, born again (eventually, if they add garbage collection.)

--
Cheers,
Kein-Hong Man (esq.)
Selangor, Malaysia