[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Pentium 4 and misaligned doubles
- From: Mike Pall <mikelu-0508@...>
- Date: Tue, 16 Aug 2005 20:16:34 +0200
Hi,
Rici Lake wrote:
> >Of course compiling with -malign-double is the easiest thing to
> >solve this with GCC. Alas, this breaks the x86 ABI. I think this
> >doesn't matter since the whole Lua core never passes structures
> >or unions that contain doubles to C library functions or back.
>
> Doesn't that change the stack alignment of double arguments? That would
> affect programs that called, for example, lua_pushnumber(), no? Or am I
> misreading something?
The calling conventions do not change. Yes, this means a function
with (double, int, double) parameters always receives one of the
doubles unaligned. But interestingly freestanding doubles (static
or locals) are always aligned. IMHO -malign-double only affects
the x86 alignment rules for doubles within structures or unions.
> >The only way I could make it work is with:
> >
> > typedef struct lua_TValue {
> > TValuefields;
> > } __attribute__ ((aligned(16))) TValue;
> >
> >A bit awkward, but solves both the stack alignment and the array
> >alignment problem.
>
> Wouldn't aligned(8) be sufficient? (At least, that wouldn't be lying to
> the compiler about the results of malloc() on x86 :) )
Umm, yes, this is sufficient. Should've checked twice. It just
doesn't work for a union or a non-structure typedef it seems.
> >On a related note: the lua_number2int() optimization should be
> >turned off if __SSE2__ is defined (which is the case with
> >-march=pentium4).
>
> I had to specify -msse2 for this to work on gcc 3.3.3
Well, not with gcc 3.3.5:
$ echo '' | gcc -march=pentium3 -E - -dM | grep SSE
#define __SSE__ 1
$ echo '' | gcc -march=pentium4 -E - -dM | grep SSE
#define __SSE2__ 1
#define __SSE__ 1
And yes, it does use the SSE/SSE2 ops in these cases, too:
$ echo 'void foo(double x, int *a) { *a = (int)x; }' >tmp.c
$ gcc -march=pentium4 -O3 -S -o - tmp.c | grep cvtt
cvttsd2si 8(%ebp), %eax
So it still seems to be safe to say that the inline assembly
replacement should only be used if !defined(__SSE2__) for
both of our compiler versions.
> PS: Before anyone else notices that I'm fond of making an idiot of
> myself in public, it is quite clear why array alignment isn't important
> (but constant alignment is). The GETTABLE/SETTABLE ops (and the
> equivalent API calls) copy the table data onto the stack with a union
> copy (see the setobj macro in lobject.h), so the fp unit is presumably
> not involved.
Even if it was involved it would cause load/store or store/load
forwarding stalls in most cases. Copying potentially unaligned
and/or non-numeric data with FP ops is a bad idea (I tried).
> The key object, on the other hand, is not copied (if the
> constant is in the acceptable range for RK operands) and is then used
> directly as a number, so it's alignment is important.
Thankfully non-integer numeric keys are rare.
Bye,
Mike