Re: Starting a JIT backend for Ravi yet again

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Starting a JIT backend for Ravi yet again
From: KHMan <keinhong@...>
Date: Fri, 29 Sep 2017 17:33:34 +0800

On 9/29/2017 4:27 PM, Dibyendu Majumdar wrote:

Hi Kein-Hong ,

On 29 September 2017 at 04:32, KHMan wrote:

On 9/29/2017 5:37 AM, Dibyendu Majumdar wrote:


[snipped all]
However, looking at the generated C code made me think ... perhaps it
is worth trying to write a hand-coded JIT compiler. The thing about
the generated code is that there is not a lot of stack usage



I am a little puzzled here: "there is not a lot of stack usage". (I only
glanced at the C example you posted not long ago, and I have never looked
into Ravi in detail.) If (via [1]) the usual value stack is used for normal
Lua code, then it's true there is not a lot of C stack usage. But the really
big wins happen when we bypass the Lua value stack.


I think using the C stack for Lua code execution is quite hard to do
... as anytime an operation can call something then the values need to
be flushed to Lua stack, and read back after the call. I am not sure
whether the cost of this would be justified except for functions that
have no calls (unlikely). Note that calls here mean anything going out
of the VM loop - such as table operations or metamethods, and not just
actual function calls.

Agree, when it comes to implementing full Lua functionality thereis no easy way of making things super fast.

Looking at [2], "Ravi Int" (interpreted?) has practically the same
performance as Lua 5.3.2.


Ravi's interpreter performance for standard Lua code is slightly worse
than Lua. This is I think due to a) larger VM, b) additional branching
as Ravi has 2 additional table sub types. When using type annotations,
depending upon the benchmark, the interpreter does better than Lua,
but the difference is not that great. The improvement only becomes
greater when the type annotated code is JITed.

For fornum_test1.lua, the 0.309 looks weird -- I don't quite believe it. Was
the "j = i" optimized away by either LLVM or dynamically by the CPU? Should
be investigated. If so, it's not valid as a benchmark test, or you should
add a footnote.


I think the loop is still being executed. libgccjit produces a number
like 0.001 which is because it eliminates the loop entirely. But then
libgccjit performs worse in other cases.

Thanks for the clarification, in which case I would/shouldwithdraw my doubts. Given the minimal nature of fornum_test1.luaand fornum_test3.lua, I would love to see them compared to a Cbaseline, assuming a compiler can produce proper code, say addvolatile to force an actual store to memory? Again, a problem isthis benchmark would be extremely sensitive to certain things.

Short benchmarks can be a problem, it amplifies certain behaviourgreatly, good if one is properly targeting certain things. But iflibgccjit eliminates the loop, it wouldn't be the same amount ofeffort done. Having effort benchmarks and optimization benchmarksmight be better in such cases.

Same with fornum_test3.lua, you have a "j = k" there, same potential
problem. The disparity between the 4.748 of Ravi(LLVM) and the 16.74 of
LuaJIT2.1 Int is suspicious. LuaJIT is very good and Ravi(LLVM) wipes it
out? Should be investigated. If something is optimized away entirely, then
it is an unfair benchmark.


Here LuaJIT is suffering from the lack of predictability in branching
I think. I have not investigated but I suspect it isn't using JIT due
to this.

Thanks for the clarification. Without data I was just shooting inthe dark :-) Sorry about the shooting.

At that point, I am quite happy to stick to standard Lua plus C libraries,
that's why I never started on any serious JIT stuff, ha ha. :-)


I agree and that is my conclusion as well as I posted earlier this
year. I have rearranged my own code so that scripting is used to
configure - but the performance sensitive parts are written in C/C++.
I now think that Lua's performance is adequate for most use cases. And
given Python's immense popularity, it is clear that performance in
scripting languages isn't the only criteria for success.

So at this point my efforts in Ravi are more for fun and learning.

I for one greatly appreciate the experimentation efforts, I'm suremany of us feel the same. Always good to have some data to compareLua, LuaJIT and Ravi, etc. The other path taken by some folks hereand on other lists is to have a scripting bit (Lua) and acompiling bit (tcc or other), much like how one programs a GPU.

JavaScript has had huge amounts of resources thrown at it, andeven then the very fast stuff like asm.js add constraints. Andwith WebAssembly static typing is back in business. Oh wait, maybeit's the core of Java, modernized, born again (eventually, if theyadd garbage collection.)


--
Cheers,
Kein-Hong Man (esq.)
Selangor, Malaysia

References:
- Starting a JIT backend for Ravi yet again, Dibyendu Majumdar
- Re: Starting a JIT backend for Ravi yet again, Dibyendu Majumdar
- Re: Starting a JIT backend for Ravi yet again, KHMan
- Re: Starting a JIT backend for Ravi yet again, Dibyendu Majumdar

Prev by Date: Re: Starting a JIT backend for Ravi yet again
Next by Date: Module for easy multithreading
Previous by thread: Re: Starting a JIT backend for Ravi yet again
Next by thread: [ANN] O'Reilly ebook: Creating Solid APIs with Lua
Index(es):
- Date
- Thread