[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: LuaJIT performance query
- From: Mike Pall <mikelu-0806@...>
- Date: Sun, 15 Jun 2008 22:36:51 +0200
> In the case of math.sin, you know a fair amount about its semantics. But
> does this sort of table access lifting run into trouble in the presence
> of storage allocations which could trigger GC work which could run __gc
> metamethods which could muck with the contents of global tables? (I'm
> not saying that they should, but they could.)
I'm currently handling this by just dropping to the interpreter if
the GC needs to be driven forward. This is in the hope that the GC
threshold has not been reached in the common case. I think this is
not an unreasonable restriction since the current memory allocator
is dead slow anyway. So if you're going to allocate and collect
tons of objects, this time will dominate anything else in the loop
a JIT compiler could possibly speed up.
I probably will need to revise this, if I get around to redesign
the memory allocator and the GC. Not in the first release, though.
BTW: abstractly seen, the need to run __gc metamethods inside a
loop only arises if userdata objects are continuously allocated
inside the same loop, too. This would necessitate calling an
external C function. But the trace recorder aborts recording if it
encounters a call to a function with arbitrary side effects.
Even if I later add such a capability, most optimizations would
have to be turned off across such calls. I'm not sure the added
complexity pays off. The interpreter is written in assembler and
is already quite fast (2x-4x over plain Lua).
> It seems like a JIT would
> benefit from some way to have JIT'd code be listed as being dependent on
> a particular table and have it get discarded in the event that anyone
> writes to that table.
I thought so, too. At least initially. But I've experimented with
quite a few cache invalidation mechanisms, and have come to the
conclusion that these don't pay off. At least not for the compiler
I'm writing now.
Seen from another level, the SSA IR generated by the trace
recorder is just a pure functional representation of a part of the
same program. No side-effects, no mutable stores -- greatly
simplifies analysis. Anything outside the trace is handled by side
exits, anything inside the trace can be completely analyzed and
This means there cannot be any "impure" writes inside a loop. And
all preconditions are checked just once before the loop. Yes, this
is a very simple approach. But I don't think it's slower than
handling complex, and possibly interdependent cache invalidations.
[ Ok, there's one instance of negative caching in Lua (and LuaJIT)
where a table stores a bit that it does not hold certain keys (a
couple of metamethod names). This cache is cleared on every write
and rebuilt when the metamethod lookup fails. This turns out to be
very effective, since metatables are usually never modified after
they are attached to an object. ]
While thinking about adding some complex mechanism, I'm always
reminded of this quote:
"The price of reliability is the pursuit of the utmost simplicity."
-- C. A. R. Hoare
> I think I just partly answered my own question about the potential ill
> effecs of __gc metamethods on caching. Since the specification doesn't
> say exactly when they run, it's probably possible to assume that all of
> the code that depends on the cache executes ahead of the __gc routine. I
> think that only holds up though if there are no loads or stores in the
> loop since otherwise the loop code could interact with the __gc code.
Well, my perspective on it (as a VM designer) is that __gc
metamethods shall not have any visible side-effects or their
effect on execution is undefined. As you've already pointed out,
there is no deterministic guarantee that they run before, inside
or after a particular loop.
I presume that if we had an authoritative Lua language spec (and
not just the docs for a particular implementation) it would
certainly make clear that __gc metamethods must not have any
visible side-effects on code that may run them indirectly by
driving the GC forward. E.g. they must not throw an error (it's
not caught and leaks through unrelated code -- happy debugging).