[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Recent Lua commits
- From: Dibyendu Majumdar <mobile@...>
- Date: Thu, 21 Sep 2017 19:05:27 +0100
On 3 August 2017 at 14:12, Dibyendu Majumdar <firstname.lastname@example.org> wrote:
> I had a quick look at the recent commits to Lua
> (https://github.com/lua/lua), and was curious to see the presence of
> specialized bytecodes for certain types of table access operations.
> Also noted the ADDI instruction (although not sure the benefit of this
> in the absence of static type information). I would be interested to
> know if any performance tests have been done to measure improvements
> resulting from these specialized bytecodes.
I had a deeper dive into some of the changes. I noticed that in three
of the benchmarks I use the performance of the new version is much
improved - congratulations!
The figures I obtain on Windows 10 are:
old: 48.98 sec
new: 38.3 sec
old: 21.21 sec
new: 18.66 sec
matrix multiply 1000:
old: 27.4 sec
new: 22.62 sec
I haven't seen exactly what bytecodes are being generated above but my
impression is that most of the benefits come from:
a) Special casing the integer key access in GETTABLE and SETTABLE
b) Caching savedpc and hookmask in the VM
c) Special casing LE/LT for integer comparisons (helps fannkuch
benchmark in particular).
I see that the GETI instruction uses immediate integer key values
stored in the bytecode itself - I would imagine that this would only
help when a table is being indexed by a literal integer value? This is
a rare usage I think.
On the other hand, special casing table access for globals (TABUP) and
when string keys are used - these are welcome changes I think as these
are heavily used, although I don't have benchmarks I can use to
validate the improvements.
I also noted the LOADI and ADDI instructions that also store an
immediate integer value in the bytecode itself.
For Ravi, the JIT compiler anyway puts integer literals inline when
possible hence storing the integer value in the bytecode is only
useful for the interpreter - but I am not yet sure how much benefit
this gives. Are there any performance results that can be shared?