[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: 'setobj' in lua-5.4.0-alpha-rc2 become more faster
- From: Andrew Gierth <andrew@...>
- Date: Sat, 15 Jun 2019 20:02:40 +0100
>>>>> ">" == 重归混沌 <findstrx@gmail.com> writes:
>> I modify the lua test code and do nothing with lua5.3.4 source code, then
>> setobj become faster.
>> Most likely, the key point is cache. But no answer in
>> <<64-ia-32-architectures-optimization-manual.
OK, I found out why this happens.
When the OP_FORLOOP code assigns a value to the visible loop variable,
it does so using two separate assignments (see the setivalue macro): one
to the value, one to the tag. The first is a 64-bit move, the second a
32-bit one. The 32-bit padding at the end of the TValue is not modified.
It turns out that when you write a 32-bit value, and then immediately
read the same location as a 64-bit or larger value, then this causes a
stall in the processor, even (apparently) if everything is hot enough to
already be in L1 cache. Presumably the pending store, which is on some
memory write pipeline, is treated as invalidating any memory fetch which
overlaps it.
If instead you write a 32-bit value and then immediately read it back
_as a 32-bit value_, then there is no stall, presumably because the
processor can fetch the whole value out of the write pipeline.
--
Andrew.