lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


A small message to let the list know the source of my problem: it seems
that the code caused a lot of calculations resulting in 'denormal
numbers', which tend to be handled much slower on some hardware [1]. My
solution (workaround?) was to enable SSE and add the -ffast-math flag to
gcc to tell the compiler I don't really care about very precise answers.

I'm not sure how denormals affect luajit, but it seems that in this case
this is no problem for the luajit implementation.


1. http://en.wikipedia.org/wiki/Denormal_number#Performance_issues


* On Tue Dec 20 11:24:23 +0100 2011, Eike Decker wrote:
 
> Just a guess into the blue, but maybe it's a double /into conversion in the
> first loop? You are mixing into and double values there.... what if you
> make sure that all arithmetical operations are done in double precision?
> On Dec 20, 2011 11:05 AM, "Ico Doornekamp" <lua@zevv.nl> wrote:
> 
> > * On Tue Dec 20 09:38:28 +0100 2011, steve donovan wrote:
> >
> > > On Tue, Dec 20, 2011 at 10:33 AM, Ross Bencina
> > > <rossb-lists@audiomulch.com> wrote:
> > > > What are your results if you comment these out?
> > >
> > > Precisely my thought. Commented out the print/printf, pushed S up to
> > > 10000, and the luajit and C times are practically the same, at about
> > > 0.6 sec.
> >
> > Ok, so there seems to be some kind of architecture / implementation
> > thing going on. I changed the programs not to print anything, new code
> > attached below. The time results:
> >
> > plain lua:    : 3.952s
> > luajit:       : 0.055s
> > gcc -O0       : 1.394s
> > gcc -O3       : 1.395s
> >
> > in which gcc is still 25 times slower as luajit!
> >
> > Not the above numbers are still measured on my core 2 duo.
> >
> > I just did the same test on a Intel Xeon @ 3.00GHz:
> >
> > plain lua:    : 1.367s
> > luajit:       : 0.060s
> > gcc -O0       : 0.367s
> > gcc -O3       : 0.014s
> >
> > Things start to get interesting here: I think there might be an issue with
> > optimization on my core 2, since there is virtually no difference between
> > the
> > unoptimized and the optimized versions. On the Xeon the results are as
> > expected
> > though, with C coming out ahead of luajit, but not by much.
> >
> > I guess my problem has no place on the lua list after all. Apologies for
> > the
> > noise, I will move to the appropriate mailing list, as soon as I find out
> > where
> > I need to go :)
> >
> > Thanks,
> >
> > Ico
> >
> >
> >
> > ----------------------------------------------------------------------
> >
> > local N = 4000
> > local S = 1000
> >
> > local t = {}
> >
> > for i = 0, N do
> >   t[i] = {
> >      a = 0,
> >      b = 1,
> >      f = i * 0.25
> >   }
> > end
> >
> > for j = 0, S-1 do
> >   for i = 0, N-1 do
> >      t[i].a = t[i].a + t[i].b * t[i].f
> >      t[i].b = t[i].b - t[i].a * t[i].f
> >   end
> > end
> >
> > return t[1].a
> >
> > ----------------------------------------------------------------------
> >
> > #include <stdio.h>
> >
> > #define N 4000
> > #define S 1000
> >
> > struct t {
> >        double a, b, f;
> > };
> >
> > int main(int argc, char **argv)
> > {
> >        int i, j;
> >        struct t t[N];
> >
> >        for(i=0; i<N; i++) {
> >                t[i].a = 0;
> >                t[i].b = 1;
> >                t[i].f = i * 0.25;
> >        };
> >
> >        for(j=0; j<S; j++) {
> >                for(i=0; i<N; i++) {
> >                        t[i].a += t[i].b * t[i].f;
> >                        t[i].b -= t[i].a * t[i].f;
> >                }
> >        }
> >
> >        return t[1].a;
> > }
> >
> > --
> > :wq
> > ^X^Cy^K^X^C^C^C^C
> >
> >
-- 
:wq
^X^Cy^K^X^C^C^C^C