[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Starting a JIT backend for Ravi yet again
- From: KHMan <keinhong@...>
- Date: Fri, 29 Sep 2017 11:32:22 +0800
On 9/29/2017 5:37 AM, Dibyendu Majumdar wrote:
[snipped all]
However, looking at the generated C code made me think ... perhaps it
is worth trying to write a hand-coded JIT compiler. The thing about
the generated code is that there is not a lot of stack usage
I am a little puzzled here: "there is not a lot of stack usage".
(I only glanced at the C example you posted not long ago, and I
have never looked into Ravi in detail.) If (via [1]) the usual
value stack is used for normal Lua code, then it's true there is
not a lot of C stack usage. But the really big wins happen when we
bypass the Lua value stack.
[1]
http://the-ravi-programming-language.readthedocs.io/en/latest/ravi-jit-status.html
Looking at [2], "Ravi Int" (interpreted?) has practically the same
performance as Lua 5.3.2. So, keeping to the Lua value stack will
always be a problem -- you can never use the full power of a
modern CPU with this code. Higher Dcache load, can't put more
things into the instruction stream and Icache, complexity of Lua
value stack to CPU means predictive data fetching might not help,
superscalar operation/scheduling may suffer, under-utilizing
physical registers, can't keep branch prediction may suffer, can't
get stack engine into play, can't have small segments of hot code
that will fit into the final trace/loop buffer feeding the
parallel instruction decoders, all those sort of things. The cost
of cache misses is very high at lower levels. Given something one
can write as a lean-and-mean C function, anything different will
suffer to various degrees.
For fornum_test1.lua, the 0.309 looks weird -- I don't quite
believe it. Was the "j = i" optimized away by either LLVM or
dynamically by the CPU? Should be investigated. If so, it's not
valid as a benchmark test, or you should add a footnote.
Same with fornum_test3.lua, you have a "j = k" there, same
potential problem. The disparity between the 4.748 of Ravi(LLVM)
and the 16.74 of LuaJIT2.1 Int is suspicious. LuaJIT is very good
and Ravi(LLVM) wipes it out? Should be investigated. If something
is optimized away entirely, then it is an unfair benchmark.
[2]
http://the-ravi-programming-language.readthedocs.io/en/latest/ravi-benchmarks.html
The rest of the Ravi(LLVM) and LuaJIT2.1 Int benchmark results are
comparable, in the ballpark. It's a dilemma with such languages,
the more annotations and/or constraints added or inferred, the
closer we get to a lean-and-mean C function. Even so, if we want
to approach the performance of the best math libraries, then there
is a lot more to do, SSE*, AVX, inst scheduling, cache
considerations, data behaviour that can be picked up by the
prefetch predictor, branching behaviour that can help the branch
predictor, etc. etc.
At that point, I am quite happy to stick to standard Lua plus C
libraries, that's why I never started on any serious JIT stuff, ha
ha. :-)
Also, consider all the things that have been done on JavaScript. A
few years down the line, all the JS stuff will be moot when
WebAssembly matures.
- hence
register allocation is perhaps somewhat easier (i.e. mostly static
allocation, with some dynamic allocation). > So just for fun and as a
learning exercise, I am planning to start a new JIT where I will use
the excellent 'dynasm' product from Mike Pall. Only X86-64 to start
with. If anyone wants to join in this adventure, you are welcome.
--
Cheers,
Kein-Hong Man (esq.)
Selangor, Malaysia