• Subject: Re: LuaJIT2 performance for number crunching
• From: Mike Pall <mikelu-1102@...>
• Date: Wed, 16 Feb 2011 17:16:51 +0100

```Daurnimator wrote:
> I got bored... See attached for a lua Dormand–Prince integrator...
> Pretty much a verbatim copy of http://adorio-research.org/wordpress/?p=6565

Looks ok, except for this:

[...]
local a21 = 1/5
local a31 = 3/40
local a32 = 9/40
[...]
local function ode_dormandprince ( fxy , x0 , x1 , y0 , tol , hmax , hmin , maxiter )
[...]
while true do
local K1 = fxy ( x , y )
local K2 = fxy ( x + c2*h, y + h*( a21*K1 ) )
local K3 = fxy ( x + c3*h, y + h*( a31*K1 + a32*K2 ) )
[...]

Rule #437: Never hoist out constants (if possible)!

Always write constants inline, otherwise they are treated as
variables, which generates worse code. And, yes, even plain Lua is
able to fold 1/5 into a constant. It's important to write these
unambiguously -- FP arithmetic is not associative!

So use either of these expressions:

y = (1/5)*x  -- fast
y = x*(1/5)  -- fast

But avoid these expressions:

y = 1*x/5    -- slow!
y = x*1/5    -- slow!
y = x/5      -- slow!

Actually these calculate something different for corner cases,
which may be important, depending on the algorithm. Given the
nature of this algorithm, the fast variant is preferred.

Another thing ... replace:

if delta <= 0.1 then h = h * 0.1
elseif delta >= 4 then h = h * 4.0
else h = delta * h end

if h > hmax then h = hmax end

with:

h = math.min(4.0, math.max(0.1, delta)) * h

hmax = math.max(hmax, h)

But keep the other conditionals, since these are strongly biased
and the uncommon paths are never compiled.

--Mike

```