|
On 2/16/2011 9:36 PM, Francesco Abbate wrote:
2011/2/16 KHMan<keinhong@gmail.com>:Sorry to barge in, it is a worrying difference. One thing is bugging me: Is the C code running SSE2? IIRC gcc -O2 does not normally enable SSE2.Hmmm, I've to confess that I don't have a very deep knowledge of SSE-related optimization flags. My approach was quite naive, I use standard optimization flags like "-O2" or "-O2 -fomit-frame-pointer" and I leave gcc doing his works. My idea is quite simple, I want to compare optimized C code with LuaJIT2 and with "optimized" I just mean "standard optimizations". For the other side I guess your remark is good, to be completely fair the benchmark should include the best possible optimization flags. Probably I should use "-march=native", I believe this is activated by default in ubuntu.
Always check gcc -vUbuntu is *very* conservative. My Ubuntu 8.04 vanilla gcc installation is saying something about i486... I don't think they will ever err on the side of native processor checks.
Otherwise there are some flags that may be you should not activate with GSL to not degrade the accuracy. For example I know that you cannot use -ffast-math and I don't know if you can use -mfpmath=sse because, if I understood correctly, with SSE you dont have the extra precision of 80-bit wide numbers and this can potentially degrade the accuracy.
I am not familiar with GSL or the thing being benchmark so I can't comment on that. I only hope you can avoid making it fragile if you can -- anyhow lots of supercomputer people run BLAS or the Goto stuff and they seem to be happy with SSE*.
IIRC, wasn't LuaJIT using SSE2 for floating point? (I haven't checked the sources, I'm not totally sure of this but I believe I've read it before.)
I can make some more tests to have a more fair benchmark but this is a little bit outside of the scope of my simple benchmark.
It might mean that it would be hard to draw conclusions in an apples-to-oranges comparison, unless something more comparable is running, such as SSE2-and-SSE2, then there is less variation to consider when drawing any useful conclusions from the exercise.
Well gcc 4.5.x has autovectorizations and all that, but you'll never get its benefits if you use the default i387. Granted, gcc won't be the greatest at those things compared to the Intel compiler, but without enabling SSE2, I suspect you will get a wide chasm if the library has to be hobbled with the x87 float instructions.
-- Cheers, Kein-Hong Man (esq.) Kuala Lumpur, Malaysia