On Jul 13, 2014, at 6:53 AM, Geoff Smith <firstname.lastname@example.org> wrote:
Interesting numbers. A couple of comments…
Benchmarking C code is tricky even for simple “for" loops. Using non-optimized code (-O0) will of course not give a representative result, but using something like -O2 (or the VC equivalent) will often result in the compiler optimizing the loop out of existence (literally). You should generate assembly language and inspect it to see what the compiler is actually generating.
The use of 64-bit integers on a 32-bit target can also vary widely depending on the compiler, CPU, and optimization setting. Most 32-bit instruction sets do have some support for 64-bit operations, but of course their use depends on the compiler. GCC and clang both make good use of the 64-bit instructions even when compiling for 32-bit *IF* the compiler is told that they are available (otherwise it must compile for a generic x86), and this can make a huge difference.
The C code float vs double times is curious. I suspect this is the result of the extra casts between single and double inserted by the compiler.