[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: OP_HALT is really useful
- From: nobody <nobody+lua-list@...>
- Date: Mon, 28 May 2018 06:06:56 +0200
On 2018-05-27 21:22, Sven Olsen wrote:
Interesting. After mucking around in the VM for the purposes of
applying your patch, I've started daydreaming about writing some
instrumentation hooks of my own.
*snip*
Do you have any words of wisdom for someone just starting down this
path? (It sounds like, maybe, implementing some sort of
sampling-based instrumentation using OP_HALT-like hooks turns out to
be faster than hacking a new switch into the core VM?
A while ago I hacked a stupid-simple profiler into Lua - just constantly
and unconditionally barfs tons of info into fd3 (or /dev/null, if you
don't assign that from the shell (3>somewhere))… It's counting all
calls, instructions (one counter per instruction), allocations
(increases counters for _all_ functions on the call stack), loads
(load,loadstring,require,… incl. dumping the loaded code), … and the
counters get dumped whenever the thing is GCed/freed. All of that
causes it to run roughly 1.5-1.7x as long. (Additionally dumping a full
stack trace every Nth call makes it… 1.7x (every 107th), 2.8x (every
11th), 13x (trace _ALL_ the calls!) slower in total for a very
call-heavy program (~70M calls, otherwise just summing values) with
mostly <=5 stack depth.)
So if you have some idea of what info you need, you can probably afford
to have that unconditionally enabled in a profiling build.
I don't have the time to clean up the changes & turn them into a patch,
but here's a bunch of notes that may be useful:
* lobject.h/ClosureHeader: nice place for counters (ncalls,nbytes,…)
(initialize in lfunc.c/luaF_new[CL]closure)
* ldo.c/luaD_precall: all calls thru here -> ncalls++ / dump stack
* lapi.c/pushcclosure: Kill the `if (n == 0)` branch to disable light
C functions / force C closures, so that you have the counter fields
* lobject.h/Proto: add instruction counters?
(NULL-init in lfunc.c/luaFnewproto, alloc in lparser.c/close_func
using luaM_newvector(L, fs->pc, size_t) and in lundump.c/LoadCode
using luaM_newvector(S->L, n, size_t), then zero-init all counts)
(change size_t to whatever counter size you're using)
* lvm.c/vmfetch, lvm.c/donextjump: increase instr.-counter:
cl->p->prof_icounts[(ci->u.l.savedpc)-(cl->p->code)]++;
(prof_icounts is whatever you're calling the instr. counter field)
* lstate.h/global_State: per Lua state, and
lstate.h/lua_State: per thread within a Lua state
(init in lstate.c: lua_newstate, preinit_thread (no allocs) or
f_luaopen, lua_newthread (allocs ok))
* lmem.c/luaM_realloc_: all allocations go through here
* do whatever you do BEFORE the realloc call, as it might be moving
the stuff that you wanted to touch
* if tracking allocations, blame (nsize-realosize) bytes if that's >0
* if block == NULL, osize may be != 0 but a type hint (LUA_TFOO),
may want to count those to see who's slowing down the GC by creating
lots of objects (tables, strings, …)
* may also want to walk a few stack levels & track indirect counts,
just blaming your low-level constructors (Object.new, map, …)
doesn't tell you what parts of the code are actually causing this
* when you want to touch the stack, guard:
if (!G(L)->version || !L->ci) return; /* still building state */
CallInfo *ci = L->ci;
if (ci->previous == ci->next) return; /* setting up first func */
(this *seems* to take care of every wonky stack state?)
* stack traversal: just walk the ci->previous chain until NULL
* dumping accumulated info from the lua*_free* functions works well
if you properly close the state at the end (so no os.exit(foo), but
os.exit(foo,true) is ok – or patch os.exit)
That's done against 5.3.4, but only used a couple of times so far, so
the above may be incomplete / missing critical things / contain bugs.
(For stack traces, you may want to make lgc.c/freeobj (cases LUA_TLCL,
LUA_TCCL) and lfunc.c/luaF_freeproto report the closure kind (C/Lua) /
closure->cfunc (gco2ccl(o)->f) / closure->proto (gco2lcl(o)->p) /
proto->source (f->source) mapping so your stack traces can simply be a
list of closure pointers, no need to constantly translate those when you
can do that later. Then just keep a counter or timer in the state &
increment/check in ldo.c/luaD_precall whether you should dump a trace…
should be good enough, and fast.)
Have fun!
-- nobody