I have read some documents about how to profile a lua program's performance, to find out the bottleneck.
The mostly mentioned method is 'lua_sethook' with `LUA_MASKCALL | LUA_MASKRET`, and `lua_getinfo`.
we maintain a call-tree according to `lua_getinfo`, the tree node looks like the follow pesudo code.
`
call_record_t = {
call_record_t *parent;
subs = {
[KEY] = call_record_sub,
[chunk1:line1] = call_record_sub1,
[chunk2:line2] = call_record_sub2,
},
inclusive = 100 ms,
exclusive = 30 ms,
hitcount = 12 times,
error_occur = 0 times,
}
`
when LUA_HOOKCALL, we enter a sub call_record according to 'KEY'(if not exist then new it), and when LUA_HOOKRET, we do exit.
the 'KEY' above is a signature of a function to identify the unique funciton, Chunkname+linedefined is adequate to act as the KEY
As is a common method, I won't repeat it more. My problems are follows:
1. In our project,(a 3D game application), when I turn on the profiler, the fps drops from 60 to 10~15, namely, it seriously slowdown the performance. Profiler isn't precise enough; the profile-hook itself affects the host program so mush.
Reasons may be that the hook runs frequently and slowly due to time-consuming `lua_getinfo` and the analyser routine calculate hash value of KEY to enter a sub call_record and so on.
2. `call` and `return` are not always in pair because of `error` and `resume/yield`. I can't track the call/return flow!
So, to the first issue, I use multithread to solve it. The host-program's profile-hook is responsible for sampling and pushes it to a Queue, and it's in main-thread. The analyser-thread pops sampling-data from the Queue and execute computing. Here is an optimization,
When lua compiled a function to a `Proto *`, I allocate an ID for the prototype. Just
`
static void body (LexState *ls, expdesc *e, int ismethod, int line) {
/* body -> '(' parlist ')' block END */
FuncState new_fs;
BlockCnt bl;
new_fs.f = addprototype(ls);
new_fs.f->linedefined = line;
open_func(ls, &new_fs, &bl);
new_fs.f->id = luaQ_addhistoryproto(G(ls->L), getstr(new_fs.f->source), line);
checknext(ls, '(');
`; and `luaQ_addhistoryproto` keeps the mapping from ID to {chunkname, linedefined}, (Proto would be destroyed by gc)
I just push the ID to the queue, which is at **low cost** versus pushing chunkname and linedef, and it's **cheaper** to find out which sub_call_record to enter comparing hashing chunkname and linedef. For asynchronous analysing, we have to push proto'ID instead of directly pushing address of 'Proto *p' to the Queue. We keep the mapping from ID to 'chunkname' and 'linedefined', the analyser retrieves those information by `arr[call_record->ID]`. Before lua calling the hook, we execute `ar.proto_id = p->id`.
`
lua_Debug {
...
int ci_depth;
...
union{
int proto_id;
lua_CFunction *f;
}
int functype;
}
`
if a C function is called, we also do `ar.f = CLcvalue(ci->func);`. functype+union identify the funciton called. ar.f is useful mapping address to C function name by debugging-info in final profile report.
The final purpose is to reduce side-effect of analyser routinue. I am not willing to call `lua_getinfo`, but without calling `lua_getinfo`, for lua functions, caller name(ar->what, ar->name) will be absent in final report! A pity, isn't it?
To the second issue, I add a new field `cur_depth` to track it. I define a new hook LUA_HOOKRESETCI
`CallInfo {
...
int ci_depth;
}`
`CallInfo *luaE_extendCI (lua_State *L) {
...
L->nci++;
ci->ci_depth = ci->previous->ci_depth + 1; /*here to record depth*/
return ci;
}`
`
int luaD_pcall (lua_State *L, Pfunc func, void *u,
ptrdiff_t old_top, ptrdiff_t ef) {
...
status = luaD_rawrunprotected(L, func, u);
if (status != LUA_OK) { /* an error occurred? */
...
ar.event = LUA_HOOKRESETCI; /*new hook here*/
ar.i_ci = old_ci;
ar.ci_depth = oldci->ci_depth;
...
L->hook(L, &ar);
...
}
`
and the hook will inform the analyser to relocate call_record hierarchy by moving along `call_record_t *parent`,
`
void analyser(int event, int ci_depth, int functype, union {int, lua_CFunction *}u, ...)
case LUA_HOOKCALL:
cur_depth++;
assert(cur_depth == ci_depth);
if (functype is lua function)
cur_record = cur_record[u->proto_id];
if (functype is c function)
cur_record = ...
break;
case LUA_HOOKRET:
cur_depth--;
assert(cur_depth == ci_depth);
cur_record = cur_record->parent;
break;
case LUA_HOOKRESETCI:
while (cur_depth-- > ci_depth)
{
cur_record->error_occur++;
cur_record = cur_record->parent;
}
break;
`
So totally, there are two extra values passed to the sampling-hook, 'current callinfo layer' and 'prototype-id/C function address'.
Forgive my poor English.
Any suggestions?
Best regards!