lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 9/29/2017 5:37 AM, Dibyendu Majumdar wrote:
[snipped all]
However, looking at the generated C code made me think ... perhaps it
is worth trying to write a hand-coded JIT compiler. The thing about
the generated code is that there is not a lot of stack usage
I am a little puzzled here: "there is not a lot of stack usage". 
(I only glanced at the C example you posted not long ago, and I 
have never looked into Ravi in detail.) If (via [1]) the usual 
value stack is used for normal Lua code, then it's true there is 
not a lot of C stack usage. But the really big wins happen when we 
bypass the Lua value stack.
[1] 
http://the-ravi-programming-language.readthedocs.io/en/latest/ravi-jit-status.html
Looking at [2], "Ravi Int" (interpreted?) has practically the same 
performance as Lua 5.3.2. So, keeping to the Lua value stack will 
always be a problem -- you can never use the full power of a 
modern CPU with this code. Higher Dcache load, can't put more 
things into the instruction stream and Icache, complexity of Lua 
value stack to CPU means predictive data fetching might not help, 
superscalar operation/scheduling may suffer, under-utilizing 
physical registers, can't keep branch prediction may suffer, can't 
get stack engine into play, can't have small segments of hot code 
that will fit into the final trace/loop buffer feeding the 
parallel instruction decoders, all those sort of things. The cost 
of cache misses is very high at lower levels. Given something one 
can write as a lean-and-mean C function, anything different will 
suffer to various degrees.
For fornum_test1.lua, the 0.309 looks weird -- I don't quite 
believe it. Was the "j = i" optimized away by either LLVM or 
dynamically by the CPU? Should be investigated. If so, it's not 
valid as a benchmark test, or you should add a footnote.
Same with fornum_test3.lua, you have a "j = k" there, same 
potential problem. The disparity between the 4.748 of Ravi(LLVM) 
and the 16.74 of LuaJIT2.1 Int is suspicious. LuaJIT is very good 
and Ravi(LLVM) wipes it out? Should be investigated. If something 
is optimized away entirely, then it is an unfair benchmark.
[2] 
http://the-ravi-programming-language.readthedocs.io/en/latest/ravi-benchmarks.html
The rest of the Ravi(LLVM) and LuaJIT2.1 Int benchmark results are 
comparable, in the ballpark. It's a dilemma with such languages, 
the more annotations and/or constraints added or inferred, the 
closer we get to a lean-and-mean C function. Even so, if we want 
to approach the performance of the best math libraries, then there 
is a lot more to do, SSE*, AVX, inst scheduling, cache 
considerations, data behaviour that can be picked up by the 
prefetch predictor, branching behaviour that can help the branch 
predictor, etc. etc.
At that point, I am quite happy to stick to standard Lua plus C 
libraries, that's why I never started on any serious JIT stuff, ha 
ha. :-)
Also, consider all the things that have been done on JavaScript. A 
few years down the line, all the JS stuff will be moot when 
WebAssembly matures.
- hence
register allocation is perhaps somewhat easier (i.e. mostly static
allocation, with some dynamic allocation). > So just for fun and as a
learning exercise, I am planning to start a new JIT where I will use
the excellent 'dynasm' product from Mike Pall. Only X86-64 to start
with. If anyone wants to join in this adventure, you are welcome.
--
Cheers,
Kein-Hong Man (esq.)
Selangor, Malaysia