lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


tankxx wrote:
> I try hard to find out how zero-cost-pcall is implemented, but I
> failed. I'm curious about it. May Mike Pall explain it in details?

The pcall() function is implemented as a "fast function", i.e. it
doesn't set up any frame (as explained earlier). It simply does:
  BASE = RA; BASE[-1] = PC; NARGS--; PC = 8|FRAME_PCALL;
And then re-dispatches the call to the passed function. After
that, a pcall(func, args...) looks like this:

<-------.    ..        .-- BASE
         \  V  \       V
        +=====+=====+-----+-----+
 tag -> | PC  |delta| args| ... | ==>
value-> |pcall|func |     |     | ==>
        +=====+=====+-----+-----+
               delta = 8|FRAME_PCALL (a 1 slot delta)

A normal return from a pcall simply does:
  *--RESULTS = true; NARGS++;
And then continues with the standard return value adjustment.

If an error is raised from a frame higher up, the pcall frame
stops the unwinding and returns false + the error to the caller.

Ok, so this already makes pcall() as cheap as any other call for
the interpreter. More magic happens, when the code is JIT compiled:

Recording a pcall() just means recording a call to its first
argument. Since calls are inlined, the on-trace code doesn't have
to set up any frames (i.e. it's zero-cost). Only the snapshot maps
for any intermediate side exits have the pcall frame info.

One invariant is that the on-trace code doesn't throw any errors.
So this must happen in some alternative control-flow path, i.e. it
would always trigger a side exit. The side exit handler syncs its
state back to the Lua stack (which includes all frame links). Now
the interpreter is restarted with the snapshot PC, the alternative
path gets executed, throws the error and the same events as above
would take place.

If such a side exit would ever become hot, it would of course be
recorded, too. But the side trace will always be aborted when the
error is thrown. After a couple of such mishaps, the heuristics
decide that it's better to generate machine code for a fast exit
to the interpreter and patches the exit branch with its address.

--Mike