lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi,

Rici Lake wrote:
> >Of course compiling with -malign-double is the easiest thing to
> >solve this with GCC. Alas, this breaks the x86 ABI. I think this
> >doesn't matter since the whole Lua core never passes structures
> >or unions that contain doubles to C library functions or back.
> 
> Doesn't that change the stack alignment of double arguments? That would 
> affect programs that called, for example, lua_pushnumber(), no? Or am I 
> misreading something?

The calling conventions do not change. Yes, this means a function
with (double, int, double) parameters always receives one of the
doubles unaligned. But interestingly freestanding doubles (static
or locals) are always aligned. IMHO -malign-double only affects
the x86 alignment rules for doubles within structures or unions.

> >The only way I could make it work is with:
> >
> >  typedef struct lua_TValue {
> >    TValuefields;
> >  } __attribute__ ((aligned(16))) TValue;
> >
> >A bit awkward, but solves both the stack alignment and the array
> >alignment problem.
> 
> Wouldn't aligned(8) be sufficient? (At least, that wouldn't be lying to 
> the compiler about the results of malloc() on x86 :) )

Umm, yes, this is sufficient. Should've checked twice. It just
doesn't work for a union or a non-structure typedef it seems.

> >On a related note: the lua_number2int() optimization should be
> >turned off if __SSE2__ is defined (which is the case with
> >-march=pentium4).
> 
> I had to specify -msse2 for this to work on gcc 3.3.3

Well, not with gcc 3.3.5:

$ echo '' | gcc -march=pentium3 -E - -dM | grep SSE
#define __SSE__ 1
$ echo '' | gcc -march=pentium4 -E - -dM | grep SSE
#define __SSE2__ 1
#define __SSE__ 1

And yes, it does use the SSE/SSE2 ops in these cases, too:

$ echo 'void foo(double x, int *a) { *a = (int)x; }' >tmp.c
$ gcc -march=pentium4 -O3 -S -o - tmp.c | grep cvtt
        cvttsd2si       8(%ebp), %eax

So it still seems to be safe to say that the inline assembly
replacement should only be used if !defined(__SSE2__) for
both of our compiler versions.

> PS: Before anyone else notices that I'm fond of making an idiot of 
> myself in public, it is quite clear why array alignment isn't important 
> (but constant alignment is). The GETTABLE/SETTABLE ops (and the 
> equivalent API calls) copy the table data onto the stack with a union 
> copy (see the setobj macro in lobject.h), so the fp unit is presumably 
> not involved.

Even if it was involved it would cause load/store or store/load
forwarding stalls in most cases. Copying potentially unaligned
and/or non-numeric data with FP ops is a bad idea (I tried).

> The key object, on the other hand, is not copied (if the 
> constant is in the acceptable range for RK operands) and is then used 
> directly as a number, so it's alignment is important.

Thankfully non-integer numeric keys are rare.

Bye,
     Mike