lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

On 9/29/2017 5:37 AM, Dibyendu Majumdar wrote:
[snipped all]
However, looking at the generated C code made me think ... perhaps it
is worth trying to write a hand-coded JIT compiler. The thing about
the generated code is that there is not a lot of stack usage

I am a little puzzled here: "there is not a lot of stack usage". (I only glanced at the C example you posted not long ago, and I have never looked into Ravi in detail.) If (via [1]) the usual value stack is used for normal Lua code, then it's true there is not a lot of C stack usage. But the really big wins happen when we bypass the Lua value stack.


Looking at [2], "Ravi Int" (interpreted?) has practically the same performance as Lua 5.3.2. So, keeping to the Lua value stack will always be a problem -- you can never use the full power of a modern CPU with this code. Higher Dcache load, can't put more things into the instruction stream and Icache, complexity of Lua value stack to CPU means predictive data fetching might not help, superscalar operation/scheduling may suffer, under-utilizing physical registers, can't keep branch prediction may suffer, can't get stack engine into play, can't have small segments of hot code that will fit into the final trace/loop buffer feeding the parallel instruction decoders, all those sort of things. The cost of cache misses is very high at lower levels. Given something one can write as a lean-and-mean C function, anything different will suffer to various degrees.

For fornum_test1.lua, the 0.309 looks weird -- I don't quite believe it. Was the "j = i" optimized away by either LLVM or dynamically by the CPU? Should be investigated. If so, it's not valid as a benchmark test, or you should add a footnote.

Same with fornum_test3.lua, you have a "j = k" there, same potential problem. The disparity between the 4.748 of Ravi(LLVM) and the 16.74 of LuaJIT2.1 Int is suspicious. LuaJIT is very good and Ravi(LLVM) wipes it out? Should be investigated. If something is optimized away entirely, then it is an unfair benchmark.


The rest of the Ravi(LLVM) and LuaJIT2.1 Int benchmark results are comparable, in the ballpark. It's a dilemma with such languages, the more annotations and/or constraints added or inferred, the closer we get to a lean-and-mean C function. Even so, if we want to approach the performance of the best math libraries, then there is a lot more to do, SSE*, AVX, inst scheduling, cache considerations, data behaviour that can be picked up by the prefetch predictor, branching behaviour that can help the branch predictor, etc. etc.

At that point, I am quite happy to stick to standard Lua plus C libraries, that's why I never started on any serious JIT stuff, ha ha. :-)

Also, consider all the things that have been done on JavaScript. A few years down the line, all the JS stuff will be moot when WebAssembly matures.

- hence
register allocation is perhaps somewhat easier (i.e. mostly static
allocation, with some dynamic allocation). > So just for fun and as a
learning exercise, I am planning to start a new JIT where I will use
the excellent 'dynasm' product from Mike Pall. Only X86-64 to start
with. If anyone wants to join in this adventure, you are welcome.

Kein-Hong Man (esq.)
Selangor, Malaysia