lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

On 2018-05-27 21:22, Sven Olsen wrote:
Interesting. After mucking around in the VM for the purposes of applying your patch, I've started daydreaming about writing some instrumentation hooks of my own.


Do you have any words of wisdom for someone just starting down this path? (It sounds like, maybe, implementing some sort of sampling-based instrumentation using OP_HALT-like hooks turns out to be faster than hacking a new switch into the core VM?

A while ago I hacked a stupid-simple profiler into Lua - just constantly
and unconditionally barfs tons of info into fd3 (or /dev/null, if you
don't assign that from the shell (3>somewhere))… It's counting all
calls, instructions (one counter per instruction), allocations
(increases counters for _all_ functions on the call stack), loads
(load,loadstring,require,… incl. dumping the loaded code), … and the
counters get dumped whenever the thing is GCed/freed.  All of that
causes it to run roughly 1.5-1.7x as long.  (Additionally dumping a full
stack trace every Nth call makes it… 1.7x (every 107th), 2.8x (every
11th), 13x (trace _ALL_ the calls!) slower in total for a very
call-heavy program (~70M calls, otherwise just summing values) with
mostly <=5 stack depth.)

So if you have some idea of what info you need, you can probably afford
to have that unconditionally enabled in a profiling build.

I don't have the time to clean up the changes & turn them into a patch,
but here's a bunch of notes that may be useful:

 *  lobject.h/ClosureHeader: nice place for counters (ncalls,nbytes,…)
    (initialize in lfunc.c/luaF_new[CL]closure)
 *  ldo.c/luaD_precall: all calls thru here -> ncalls++ / dump stack
 *  lapi.c/pushcclosure: Kill the `if (n == 0)` branch to disable light
    C functions / force C closures, so that you have the counter fields

 *  lobject.h/Proto: add instruction counters?
    (NULL-init in lfunc.c/luaFnewproto, alloc in lparser.c/close_func
    using luaM_newvector(L, fs->pc, size_t) and in lundump.c/LoadCode
    using luaM_newvector(S->L, n, size_t), then zero-init all counts)
    (change size_t to whatever counter size you're using)
 *  lvm.c/vmfetch, lvm.c/donextjump: increase instr.-counter:
    (prof_icounts is whatever you're calling the instr. counter field)

 *  lstate.h/global_State: per Lua state, and
    lstate.h/lua_State: per thread within a Lua state
    (init in lstate.c: lua_newstate, preinit_thread (no allocs) or
    f_luaopen, lua_newthread (allocs ok))

 *  lmem.c/luaM_realloc_: all allocations go through here
 *  do whatever you do BEFORE the realloc call, as it might be moving
    the stuff that you wanted to touch
 *  if tracking allocations, blame (nsize-realosize) bytes if that's >0
 *  if block == NULL, osize may be != 0 but a type hint (LUA_TFOO),
    may want to count those to see who's slowing down the GC by creating
    lots of objects (tables, strings, …)
 *  may also want to walk a few stack levels & track indirect counts,
    just blaming your low-level constructors (, map, …)
    doesn't tell you what parts of the code are actually causing this

 *  when you want to touch the stack, guard:
    if (!G(L)->version || !L->ci)  return; /* still building state */
    CallInfo *ci = L->ci;
    if (ci->previous == ci->next)  return; /* setting up first func */
    (this *seems* to take care of every wonky stack state?)
 *  stack traversal: just walk the ci->previous chain until NULL

 *  dumping accumulated info from the lua*_free* functions works well
    if you properly close the state at the end (so no os.exit(foo), but
    os.exit(foo,true) is ok – or patch os.exit)

That's done against 5.3.4, but only used a couple of times so far, so
the above may be incomplete / missing critical things / contain bugs.

(For stack traces, you may want to make lgc.c/freeobj (cases LUA_TLCL,
LUA_TCCL) and lfunc.c/luaF_freeproto report the closure kind (C/Lua) /
closure->cfunc (gco2ccl(o)->f) / closure->proto (gco2lcl(o)->p) /
proto->source (f->source) mapping so your stack traces can simply be a
list of closure pointers, no need to constantly translate those when you
can do that later.  Then just keep a counter or timer in the state &
increment/check in ldo.c/luaD_precall whether you should dump a trace…
should be good enough, and fast.)

Have fun!
-- nobody