On Sat, Nov 14, 2009 at 9:53 PM, Mike Pall
<mikelu-0911@mike.de> wrote:
The interpreter stores all intermediate results to stack slots,
i.e. to IEEE 754 double precision FP numbers. This is equivalent
to the -ffloat-store option of GCC or Java with strictfp.
This does not solve the double rounding problem, though.
Indeed, that's what the down/up scaling is for.
[BTW: That still doesn't solve the various issues with the x87
transcendental functions or with pow().]
Yes, I will replace them with fdlibm derivatives.
The current interpreter is pure position-independent code and I
want to keep it that way. I've used loads from upvalues or
construction of constants on the stack to get around these
limitations in some cases. I think this is fast enough. If you
really need this to be maximally fast, I'd keep them in the
global_State, which can be addressed via the DISPATCH register.
Ah, I considered this, but DISPATCH isn't available in lj_vm_foldarith, right?
Although, lj_vm_foldarith doesn't need to be very fast anyway.
But IMHO the better solution is to (optionally) use SSE2 in the
interpreter, too. The intersection of the set of people who still
have an x87-only box and the set of people who really need
reproducible FP arithmetic is probably empty.
I agree it's almost not worth the bother, but I'd like to keep things as portable as possible unless it's a large effort.
My particular application right now is something similar to
http://love2d.org/ but with an easy network synchronization
mechanism (which is what I need reproducible floating point for).