lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, Nov 18, 2010 at 6:17 PM, Mike Pall <mikelu-1011@mike.de> wrote:
Petri Häkkinen wrote:
> Would it help if I counted how many temp vectors a real commercial game
> written in C++ does each frame? Is there any other statistics that would
> help in estimating the impact?

This would certainly be an interesting number to know. Please do!

But such a number says little about how many of the allocations
the compiler could optimize away. For that, I'd need to see the
actual code for the top allocation sites (the whole call chain
where the vectors are passed). Abstract code that only shows the
data-flow would suffice.


Ok, here are the result from Alan Wake, averaged over time from multiple scenes:

Vector creation ~100000
Arith ~30000
Normalization ~2500

The values were gathering by adding instrumentation code to various vector operations, not including GPU, physics or DirectX. Values are hits per frame and includes 2d, 3d and 4d vecs.

Vector creation cost seems very high but they're mostly temps which the C++ compiler should be able to optimize away most of the time. Arith ops include vector addition, subtraction and multiplication. About 50% of arith ops are in-place operations (e.g. a += b). Majority of vector normalizations use fast approximations.

Overall I think vector manipulation in AW could be optimized manually quite a bit if needed. Luckily vectors are almost always allocated from the stack or stored in data members of C++ objects, so allocation cost is negligible.

Vectors are scattered all over the code base, so it's almost impossible to get good statistics on the data-flow. Probably most of the vecs are really short lived. So yeah, even with everything optimized, I would say that there will at least a couple thousand temp vector allocs per frame in a game as complex as AW.

> Ok, too bad. But this can't mean that it's impossible to have values with
> different memory footprints in a dynamic language, does it?

Think about it that way: it requires an indirection in any case.
But this is not much of an issue. Also, it doesn't matter whether
it points to memory on the heap or the stack or wherever.

The real issue is how and when to dispose of or recycle the memory
that holds the oversized values. The traditional approach in
memory-safe languages is to use allocation/GC. Holding values in
the stack is not memory-safe, unless you can prove these values
can never escape. The same logic applies, whether it's on the C
stack or some kind of auxiliary part of the Lua value stack that
gets cleaned up on return.

So eliminating the allocation requires escape analysis, anyway.
Ditto for store sinking + allocation sinking, which is strictly
more powerful (this works, even when the value escapes only in
some uncommonly executed part of the code). I'm aiming for the
latter, but this will take a lot of time to design and implement
(maybe after the next beta release).


Ok, I see the problem. I'll get back to this if I get any new ideas.

Value semantics ruled out for now, what is your recommendation for vector implementation with current version of LuaJIT? Userdata + C operations or Lua only implementation with tables? Obviously FFI in the future is the way to go, or is it? I would like to keep the functional syntax if at all possible.

Petri