[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: luajit vs C performance, unexplainable difference ?
- From: Ico Doornekamp <lua@...>
- Date: Tue, 20 Dec 2011 13:05:20 +0100
A small message to let the list know the source of my problem: it seems
that the code caused a lot of calculations resulting in 'denormal
numbers', which tend to be handled much slower on some hardware [1]. My
solution (workaround?) was to enable SSE and add the -ffast-math flag to
gcc to tell the compiler I don't really care about very precise answers.
I'm not sure how denormals affect luajit, but it seems that in this case
this is no problem for the luajit implementation.
1. http://en.wikipedia.org/wiki/Denormal_number#Performance_issues
* On Tue Dec 20 11:24:23 +0100 2011, Eike Decker wrote:
> Just a guess into the blue, but maybe it's a double /into conversion in the
> first loop? You are mixing into and double values there.... what if you
> make sure that all arithmetical operations are done in double precision?
> On Dec 20, 2011 11:05 AM, "Ico Doornekamp" <lua@zevv.nl> wrote:
>
> > * On Tue Dec 20 09:38:28 +0100 2011, steve donovan wrote:
> >
> > > On Tue, Dec 20, 2011 at 10:33 AM, Ross Bencina
> > > <rossb-lists@audiomulch.com> wrote:
> > > > What are your results if you comment these out?
> > >
> > > Precisely my thought. Commented out the print/printf, pushed S up to
> > > 10000, and the luajit and C times are practically the same, at about
> > > 0.6 sec.
> >
> > Ok, so there seems to be some kind of architecture / implementation
> > thing going on. I changed the programs not to print anything, new code
> > attached below. The time results:
> >
> > plain lua: : 3.952s
> > luajit: : 0.055s
> > gcc -O0 : 1.394s
> > gcc -O3 : 1.395s
> >
> > in which gcc is still 25 times slower as luajit!
> >
> > Not the above numbers are still measured on my core 2 duo.
> >
> > I just did the same test on a Intel Xeon @ 3.00GHz:
> >
> > plain lua: : 1.367s
> > luajit: : 0.060s
> > gcc -O0 : 0.367s
> > gcc -O3 : 0.014s
> >
> > Things start to get interesting here: I think there might be an issue with
> > optimization on my core 2, since there is virtually no difference between
> > the
> > unoptimized and the optimized versions. On the Xeon the results are as
> > expected
> > though, with C coming out ahead of luajit, but not by much.
> >
> > I guess my problem has no place on the lua list after all. Apologies for
> > the
> > noise, I will move to the appropriate mailing list, as soon as I find out
> > where
> > I need to go :)
> >
> > Thanks,
> >
> > Ico
> >
> >
> >
> > ----------------------------------------------------------------------
> >
> > local N = 4000
> > local S = 1000
> >
> > local t = {}
> >
> > for i = 0, N do
> > t[i] = {
> > a = 0,
> > b = 1,
> > f = i * 0.25
> > }
> > end
> >
> > for j = 0, S-1 do
> > for i = 0, N-1 do
> > t[i].a = t[i].a + t[i].b * t[i].f
> > t[i].b = t[i].b - t[i].a * t[i].f
> > end
> > end
> >
> > return t[1].a
> >
> > ----------------------------------------------------------------------
> >
> > #include <stdio.h>
> >
> > #define N 4000
> > #define S 1000
> >
> > struct t {
> > double a, b, f;
> > };
> >
> > int main(int argc, char **argv)
> > {
> > int i, j;
> > struct t t[N];
> >
> > for(i=0; i<N; i++) {
> > t[i].a = 0;
> > t[i].b = 1;
> > t[i].f = i * 0.25;
> > };
> >
> > for(j=0; j<S; j++) {
> > for(i=0; i<N; i++) {
> > t[i].a += t[i].b * t[i].f;
> > t[i].b -= t[i].a * t[i].f;
> > }
> > }
> >
> > return t[1].a;
> > }
> >
> > --
> > :wq
> > ^X^Cy^K^X^C^C^C^C
> >
> >
--
:wq
^X^Cy^K^X^C^C^C^C
- References:
- luajit vs C performance, unexplainable difference ?, Ico
- Re: luajit vs C performance, unexplainable difference ?, Ross Bencina
- Re: luajit vs C performance, unexplainable difference ?, steve donovan
- Re: luajit vs C performance, unexplainable difference ?, Ico Doornekamp
- Re: luajit vs C performance, unexplainable difference ?, Eike Decker