Hi Wolfgang

On 6/02/2012 11:48 AM, Wolfgang Pupp wrote:
> I just tried another approach:
> A circular buffer for intermediate results- there are no more
> allocations for calling arithmetic functions, *BUT* if you don't copy
> results (and just keep references instead), they *will* be overwritten
> (sooner or later).
> I didn't find any cheap and easy way to check how many times an object
> is still referenced, and I think this is a pretty decent usability vs.
> speed tradeoff.

I asked a related question back in October. The thread is called "rapid lifetime based collection of transient user data objects?" if you want to look it up.

A number of approaches were discussed.

LuaPlus includes a patch which adds eager reference-counted collection of objects which I think might give you the behavior you want (when combined with a freelist of vectors). I've no idea what it would take to add reference counted GC to LuaJIT (although I'd be very interested to hear what Mike thinks about adding this feature.)

I'm not sure what benefit a circular buffer gives you over generational GC -- aside from the risk of overflowing the buffer and introducing hard to debug errors.

I also have an application in mind where keeping the cache free of bazillions of temporary vectors is desirable.

I'm not sure this is at all possible to add to the Lua VM, but another approach might be a flag in the user data tags that indicates whether the user data object has multiple references or whether it has only been referenced once on the Lua stack. if it's only been on the Lua stack once it could get collected/finalised as soon as it goes out of scope (popped from stack), otherwise it is subject to normal GC policies -- that wouldn't be as strong as the luaplus refcount patch, but it might work for temporaries.



I also tried to make it use SSE, and that seems to work just fine
(MinGW on Win7 32). It needs a tiny wrapper-dll, because LuaJIT can't
directly call ffi-functions with vector arguments (yet!)- so I pass
them via pointers.
I only implemented single-precision-4-float addition for now anyway-
it's just a proof of concept, I think ffi- vector operations are
somewhere on Mike's TODO-list (maybe someone will even sponsor that
and we'll have it in a blink ;).

Every kind of opinion/feedback is much appreciated- I'm learning,
after all, and sometimes even a plain "you're doing it wrong" helps a
LOT ;)

It's used like this:
v4sf = require 'v4sf'
local a, b = v4sf(1, 2, 3, 4), v4sf(1, 0, 1, 0)
print(a+b)  --intermediate result, don't keep a reference for too long!
local c = v4sf(a+b)  --when you *need* to keep the value

Here is the code:
local M = {_NAME = 'v4sf', _VERSION = '0.1', _DESCRIPTION = [[
Module for 4d-vectors, single precision.
Vectors are immutable once constructed.
Vectors returned by metamethods (addition, etc.) are only temporary values and
_MUST_ be copied if you want to keep a reference to them.
Construct new vectors by calling the module table itself or its new- function.
- new(<number>  a,<number>  b,<number>  c,<number>  d)
   returns<v4sf>  A vector with elements a, b, c, d
- new(<v4sf>  v)
   returns<v4sf>  A copy of vector v
-<v4sf>: A vector with 4 (single-precision-float) elements.
   - __add(<v4sf>  a,<v4sf>  b) returns<v4sf>  The sum of a and b
   - __tostring(<v4sf>  v)
     returns<string>  Temporary values are marked with a 'tmp '- prefix.

local ffi = require'ffi'
--[[Can't call SSE functions directly (LuaJIT NYI), so we have to cheat a bit.
lua_sse.c is compiled like this (MinGW gcc):
   gcc -O2 -msse -shared -o lua_sse.dll lua_sse.c
and should look like this:
   void lua_mm_add_ps(__m128 *r, __m128 *a, __m128 *b) { *r =
_mm_add_ps(*a, *b); }
typedef float m128 __attribute__ ((__vector_size__ (16)));
void lua_mm_add_ps(m128 *r, m128 *a, m128 *b);
local sse = ffi.load 'lua_sse'

   local metatable
   local ctype = ffi.typeof 'm128[1]'
   local cBuffer = {}
   local cBufferIdx = 1
   local function new_tmp(a, b, c, d)
     return setmetatable({tmp = true, cdata = ctype{{a, b, c, d}}}, metatable)
   function, b, c, d)
     if type(a) == 'table' and getmetatable(a) == metatable then
       --copy constructor
       return setmetatable({cdata = ctype{{a[0], a[1], a[2], a[3]}}}, metatable)
     return setmetatable({cdata = ctype{{a, b, c, d}}}, metatable)
   metatable = {
     __tostring = function(v)
       if v.tmp then
         return ("tmp v4sf: %g,%g,%g,%g"):format(v.cdata[0][0],
v.cdata[0][1], v.cdata[0][2], v.cdata[0][3])
         return ("v4sf: %g,%g,%g,%g"):format(v.cdata[0][0],
v.cdata[0][1], v.cdata[0][2], v.cdata[0][3])
     __add = function(a, b)
       local tmp = cBuffer[cBufferIdx]
       sse.lua_mm_add_ps(tmp.cdata, a.cdata, b.cdata)
       cBufferIdx = (cBufferIdx+1) % #cBuffer
       return tmp
     __index = function(v, k)
       if type(k) == 'number' and k>= 0 and k<  4 then
         return v.cdata[0][k]
     __newindex = function(v, k, newValue)
       error "Assigning to vectors is not allowed"
   for i=1,10 do cBuffer[i] = new_tmp() end
   setmetatable(M, {
     __call = function(_, a, b, c, d) return, b, c, d) end,
     __tostring = function(m) return m._DESCRIPTION end,

return M