lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


David Given wrote:
> Does LJ2 not use dual representation where numbers are promoted from  
> integer to float as needed? I'd have thought (based on my minimal  
> knowledge of trace compilers...) that this wouldn't be terribly hard and  
> would allow, e.g., use of the integer ALU instructions where possible  
> and only fall back to floats for those operations that needed it. Is the  
> extra complexity not worth the speed boost in real life?

Well, it does. But too much narrowing is not beneficial. At least
on a desktop-class super-scalar CPU. In general one needs to
replace one FP ALU op with an int ALU op plus a branch for the
overflow check.

But the execution bandwidth of int or FP ALU ops is e.g. 3 per
cycle vs. only 1 branch per cycle on a Core2. Saturating the
branch execution port and the reorder buffers quickly nullifies
any performance gain. Another problem is the increased pressure on
integer registers and integer execution bandwidth. The available
FPU execution bandwidth is effectively wasted.

It's hard to avoid the overflow checks, except for induction
variables. And you need one more compare and branch to preserve -0
for NEG/MUL/DIV. The longer latencies for FP ops are a non-issue,
except in loop-controlling expressions. Of course you can't avoid
the integer conversions for indexing expressions.

That's why LJ2 does predictive narrowing for induction variables
and demand-driven narrowing via backpropagation for indexing
expressions and bit operations. Numbers are never stored as
integers, except inside compiled traces. The interpreter and most
parts of the VM only have to deal with a single FP number type.

This works out just fine for desktop-class CPUs, but of course one
needs to rethink narrowing for FP-challenged CPUs. In this case a
dual representation of stored number values with normalization
steps after FP operations is probably better. But this has
far-reaching consequences for the whole VM. So far I've been able
to avoid it.

--Mike