> > It can also be used for dtrace-like instrumentation points.
> Interesting. After mucking around in the VM for the purposes of applying
> your patch, I've started daydreaming about writing some instrumentation
> hooks of my own...
> However, after reading your posts, I'm guessing that you've implemented
> some instrumentation and profiling hooks...
Our profiler is fairly simple and I think I described it in another thread on this list a while back which inspired some "but Lua already handles that" responses which were partially but not completely accurate given our full context. I don't recall how much detail I responded to those. I don't actually follow this list much in the last few years, just scan every few months or so to see if any topics of interest have come up.
> Do it in Lua, profile and then decide if you need to modify the Lua interpeter.
Fair points. I did (profile) and I did (have to modify the interpreter).
Our internal profiler came out of finding various ways the traditional count and line hooks produced misleading data.
Just as the OP_HALT patch came out of finding that it was worth modifying the interpreter to change the fundamental performance characteristics* of the usual (even in the most optimized case) Lua way of implementing breakpoints with the debug hooks.
All that said, this above response is on point in the sense that it is very easy to create something sophisticated when something simpler will suffice and that it is very important to prototype an answer to confirm the utility of the data and to profile the result to understand its characteristics before going too far down a rabbit hole.
I have two other main pieces of feedback:
- In any solution to a profiling problem, be aware of what biases it will produce in the data due to its own overhead/measurement methods.
For example, using the opcode count hook will disproportionately punish opcode heavy code even if it is isn't particularly slow and miss expensive opcodes that are causing slowness. For example, if you have a loop that is doing computation it might have lots of math opcodes, but they might be very fast and low overhead. On the other hand, a single slow native call might just be a couple of opcodes, so the count hook will run right over it and blame something a few opcodes down the line. Honestly, this might still be a good enough clue, but depending on what it blames it might take you a while to realize the real culprit.
So you're stuck between a high frequency (low number) count hook lying to you about opcode heavy code that's not really slow and a low frequency (high number) count hook potentially pointing to completely irrelevant code potentially very far from the real culprit.
Similarly, a hook that gathers a full stack trace can disproportionately punish code tends to be called in a deep call stack even if it isn't slow (because gathering the stack itself is slow when there are lots of frames). The posted Lua based profiler code smartly avoids this by just sampling one frame, not the whole stack. That does mean it can be harder to reconstruct the execution to understand the context, but does avoid recursion depth distortions. In practice, if the call tree is a real problem, you'll see multiple frames high in the results and probably piece it together anyway. You can also pick a middle ground where you walk some fixed number of frames upward in order to get a little more context, but less distortion in the data.
Also be careful what you measure. For example, the os.clock() API measures CPU time. Sometimes this is what you want to measure, but it won't help you find slowness that affects wall time but is not CPU time intensive (some blocking APIs will leave the CPU quite idle, but will still take a lot of time).
- I eventually discarded my experiments with dtrace-like profiling despite it showing some significant promise for certain situations
"Trace points" were starting to prove very handy as a quick/dynamic (and very low overhead) way to measure time/paths taken in certain segments of code that were executing. Combined with our low overhead sampling profiler's more holistic view of which code was running when pausing periodically, it could save a lot of time quantifying the costs/distributions of different code paths of particular interest. The two approaches (top down and bottom up) were proving complementary.
The best part of it was that it only imposed overhead in proportion to the number of opcodes instrumented and their call frequency whereas the debug.hook answers impose a load system-wide that distorts unrelated operations as well.
That said, doing that instrumentation more manually wasn't that difficult/costly, so I decided not to pursue it further.
The other thing that made it less useful than hoped was that some code I wanted to instrument this way could be re-entered from multiple co-routines** and/or involved async calling conventions so it could be difficult to match up which start/stop points go together. Usually there was a place to grab some context to match it up, but not always and it could get complicated.
Anyway, because we had low overhead breakpoints that could run code, the technique I used instead of further investment in specialized instrumentation was to use breakpoints that ran code to log/collect information. It was higher overhead and more manual, but also much more flexible, so it was a better answer.
So the answer I picked _was_ to do more of it in Lua and OP_HALT was what made that viable whereas the debug.hook based solutions imposed too much overhead to get accurate results. It is actually still dtrace-like, but our regular code/logging breakpoints instead of a new type of tracing breakpoint.
The code coverage with self-clearing breakpoints wound up getting shelved as well, but only because it wasn't until later I pinned the bug that had been causing it to sporadically crash (one of the fixes I posted in my earlier replies on this thread). I may still resurrect that experiment.
* OP_HALT has no overhead until the opcode is hit, line/opcode hook imposes a (non-trivial) cost on every opcode
** IIRC the call/return/tail return hooks only implicitly inform you about coroutine switches via coroutine.running() changing, so be careful of that when assembling call stacks using those hooks