lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Arran Cudbard-Bell wrote:
> As much as possible the code avoids [...] caching array/hash lookups [...]

Umm, this is not always beneficial. It often increases the number
of live variables, but the compiler can often do better with
hoisting or forwarding. At least in tight loops.

> the main overhead from this function (we have some benchmarking
> stuff using nanosecond timers) is actually the call to
> ffi.string to convert the char array to a Lua string (something
> like 0.4 microseconds for the call and 0.6 microseconds for the
> conversion).

Ouch, that's 3000 cycles, which is a lot. How long are these
strings? How similar are these strings? Are you sure the overhead
isn't caused by too many hash collisions?

[Of course the best idea is not to allocate strings at all,
if you're aiming for top performance.]

> The compiler trace for the artificial capture that gives 80K
> learns per second looks pretty clean:
> 
> [TRACE   1 rti.lua:1501 loop]
> [TRACE   2 rti.lua:1006 loop]
> [TRACE   3 (1/0) rti.lua:1502 -> 1]
> [TRACE   4 (1/5) rti.lua:1502 -> 1]
> [TRACE   5 (2/2) rti.lua:1007 -> 2]
> [TRACE   6 (3/0) rti.lua:1502 loop]
> [TRACE   7 rti.lua:1242 return]
> [TRACE   8 (5/1) rti.lua:1006 -> 1]
> [TRACE   9 (4/0) rti.lua:1502 -> 1]
> [TRACE  10 (6/5) rti.lua:1358 -> 2]
> [TRACE  11 (6/8) rti.lua:1502 -> 1]
> [TRACE --- rti.lua:1276 -- leaving loop in root trace at rti.lua:1280]
> [TRACE  12 (6/0) rti.lua:1502 loop]
> [TRACE  13 (9/3) rti.lua:1312 -> 1]
> [TRACE --- rti.lua:1608 -- inner loop in root trace at rti.lua:1502]
> [TRACE  14 (7/0) rti.lua:1244 -> 1]
> [TRACE  15 (12/20) rti.lua:1502 -> 1]
> [TRACE  16 (12/0) rti.lua:1502 -> 1]

No, I don't think this is a good set of traces. Most of them go
back to the loop at trace 1, which indicates branchy code. Draw
the graph and you'll see it's e.g. a linear sequence for traces
1,3,6,12,16. All of these diverge at rti.lua:1502 and even worse,
these are /0 side traces. This usually indicates some instable
precondition, e.g. type instability, hash instability and many
other possible causes.

You could check the relative time spent in the different traces to
estimate the overall effect. Look for LUAJIT_USE_PERFTOOLS in
src/lj_trace.c.

I'd have to see the -jdump=+rsx output to be able to tell you
what's happening there (but don't send big dumps to the list).

> It seems the areas of the code being traced change wildly
> depending on whether functions are locally scope, and other
> seemingly random things like re-using variable names even if the
> variables are locally scoped.

The tracer uses heuristic profiling, which is affected e.g. by
memory layout. You may want to temporarily turn off ASLR to get
(somewhat) stable behavior.

> Does tighter variable and function scope help the JIT optimize
> that much?

Not that much. Reducing the number of live variables does help to
improve code quality, especially on CPUs with few registers (x86).

> Should I be looking at reducing the number of 'public'
> (non-local) functions and variables in modules? 

Intra-module calls should always be to local functions, anyway.
Inter-module coupling usually doesn't affect performance that
much, unless you've messed up coupling/cohesion during the design.

--Mike