lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

On 2/16/2011 9:36 PM, Francesco Abbate wrote:
2011/2/16 KHMan<>:
Sorry to barge in, it is a worrying difference. One thing is bugging me: Is
the C code running SSE2? IIRC gcc -O2 does not normally enable SSE2.

Hmmm, I've to confess that I don't have a very deep knowledge of
SSE-related optimization flags. My approach was quite naive, I use
standard optimization flags like "-O2" or "-O2 -fomit-frame-pointer"
and I leave gcc doing his works. My idea is quite simple, I want to
compare optimized C code with LuaJIT2 and with "optimized" I just mean
"standard optimizations".

For the other side I guess your remark is good, to be completely fair
the benchmark should include the best possible optimization flags.
Probably I should use "-march=native", I believe this is activated by
default in ubuntu.

Always check gcc -v

Ubuntu is *very* conservative. My Ubuntu 8.04 vanilla gcc installation is saying something about i486... I don't think they will ever err on the side of native processor checks.

Otherwise there are some flags that may be you
should not activate with GSL to not degrade the accuracy. For example
I know that you cannot use -ffast-math and I don't know if you can use
-mfpmath=sse because, if I understood correctly, with SSE you dont
have the extra precision of 80-bit wide numbers and this can
potentially degrade the accuracy.

I am not familiar with GSL or the thing being benchmark so I can't comment on that. I only hope you can avoid making it fragile if you can -- anyhow lots of supercomputer people run BLAS or the Goto stuff and they seem to be happy with SSE*.

IIRC, wasn't LuaJIT using SSE2 for floating point? (I haven't checked the sources, I'm not totally sure of this but I believe I've read it before.)

I can make some more tests to have a more fair benchmark but this is a
little bit outside of the scope of my simple benchmark.

It might mean that it would be hard to draw conclusions in an apples-to-oranges comparison, unless something more comparable is running, such as SSE2-and-SSE2, then there is less variation to consider when drawing any useful conclusions from the exercise.

Well gcc 4.5.x has autovectorizations and all that, but you'll never get its benefits if you use the default i387. Granted, gcc won't be the greatest at those things compared to the Intel compiler, but without enabling SSE2, I suspect you will get a wide chasm if the library has to be hobbled with the x87 float instructions.

Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia