lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Asko Kauppi wrote:
> In my view, Mike's results are in line with the Core Duo results I had 
> measured.
> What cannot be done is treat x86 as a single optimization target. It is 
> not.

The results for a Pentium III or a Pentium 4 are comparable.

> Then again, modern x86's are utterly fast on FP, which they calculate as 
> fast as integers. The goal with them is to not slow down unpatched 
> performance, and it is not totally clear to me where the -2..-8% slowdown 
> actually comes from. I will have a look.

Almost every bytecode instruction now has to check whether a
number is an integer or a FP type. And the arithmetic
instructions have to check for overflow. This adds many
comparisons and branches to the fast paths and pessimizes
register allocation, too. This in turn puts additional pressure
on the integer pipelines which are already saturated with the
interpreter dispatch.

Reducing pressure on the FP pipelines in exchange for higher
integer pressure is counterproductive on a modern desktop-class
CPU. This is simply wasted CPU bandwidth.

If you think this through, you'll have to realize that LNUM is
entirely pointless on a desktop CPU. Because its basic assumption
(that integers are somehow faster than FP numbers) is unsound.

> The reason for LNUM slowing modern x86 is maybe just that integer
> calculations (s.a. simple increment) need to be range checked to find out
> potential falling to FP realm. For a FP, any operation can just be done,
> without checks. I'll look for a neato way to bypass this, so x86
> double+int32 users won't be hit by the patch (if it gets in some day).

This is only part of the problem. And you won't be able to
completely remove the overflow checks without interval analysis
(which requires data-flow analysis, i.e. unfeasible for an