Yes. I think you are right. So the cost is Penalty.
The answer in <<
64-ia-32-architectures-optimization-manual>> I found it.
In lua5.3.4 source code.
the last code in 'vmcase(OP_FORLOOP)', the last code is 'setivalue(ra+3, idx)', and 'setivalue' will be using separate assignments(one for `value_`, and one for `tt_`, the `value_` is 64 bit, and `tt_` is 32 bit).
Then, luaVM execute the `vmcase(OP_MOVE): setobjs2s(L, ra, RB(i)`, the setobjs2s also use two assignments. But all the two assignment are for 64bit data.
From the <<Intel® 64 and IA-32 Architectures Optimization Reference Manual>> => Chapter 3.6.5 => Fingure 3-3 => Condition (b).
`Size of Load > Store` will has penalty.