lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On 19-Aug-04, at 8:47 PM, Edgar Toernig wrote:

IMHO that's the most important point.  If you accept heap allocation
during vararg-function invocation you can stick with tables.  The
slight performance advantage of your tuple implementation could
be nullified by optimizing the appropriate parts of table creation.

It's not that slight. You might be surprised. Here's a quick pseudo-benchmark;
OS is FreeBSD 5.2.1, compiled with -O2 -fomit-frame-pointer

The tuple() function is the C version the patch for which is in an earlier message.

I didn't hack the syntax in, so the test function calls tuple on the arguments;

-- curried function expects a tuple
> function tcurry(fn, a) return function(tup) return fn(a, tup()) end end
-- standard varargs
> function vcurry(fn, a) return function(...) return fn(a, unpack(arg)) end end
-- fixed arguments as a baseline
> function fcurry(fn, a) return function(b, c, d) return fn(a, b, c, d) end end
-- simple test function with four arguments
> function f(a, b, c, d) return a, b, c, d end
-- Unfortunately, somewhat imprecise ansi-standard clock
> function time(f, n)
>> local now = os.clock() for i = 1, n do f() end return os.clock() - now
>> end

> return time(function() local g = fcurry(f, 1); return g(2, 3, 4) end, 1e7)
18.421875
> return time(function() local g = tcurry(f, 1); return g(tuple(2, 3, 4)) end, 1e7)
23.59375
> return time(function() local g = vcurry(f, 1); return g(2, 3, 4) end, 1e7)
35.921875

So tupling is about 30% slower than the baseline, and varargs is about twice as slow.

That was promising enough to make it worthwhile testing for real: I changed adjust_varargs in ldo.c to the following: (this is 5.0.2)

static void adjust_varargs (lua_State *L, int nfixargs, StkId base) {
  int i;
  Closure *cl;
  int actual = L->top - base;  /* actual number of arguments */
  if (actual < nfixargs) {
    luaD_checkstack(L, nfixargs - actual);
    for (; actual < nfixargs; ++actual)
      setnilvalue(L->top++);
  }
  actual -= nfixargs;  /* number of extra arguments */
  cl = luaF_newCclosure(L, actual);  /* create `arg' table */
  cl->c.f = lua_pushupvalues;
  for (i=0; i<actual; i++)  /* put extra arguments into `arg' table */
    setobj2n(&cl->c.upvalue[i], L->top - actual + i);
  L->top -= actual;  /* remove extra elements from the stack */
  setclvalue(L->top, cl);
  incr_top(L);
}

which makes arg into a tuple instead of a table. With that modification, I got a tiny speed improvement (but the syntax is better):

> function tcurry(fn, a) return function(...) return fn(a, arg()) end end
> function f(a, b, c, d) return a, b, c, d end
> function time(f, n)
>> local now = os.clock() for i = 1, n do f() end return os.clock() - now
>> end
> return time(function() local g = tcurry(f, 1); return g(2, 3, 4) end, 1e7)
23.2109375

In short, for about 6 lines of code changed, a 35% speedup. It's hard to imagine that the ... thing would be faster than the baseline, and presumably the extra copying and stack management would take some time, so we could guess that the alternative is considerably more change to the code base for a 45% speedup. And I still say that tuples have extra uses.

The currying is supposed to create a new object, isn't it?

Yes, but it might be a very short-lived object. Iterators and
mapping functions spring to mind.

Hmmm... nitpicking: from the GC point of view a table is one
object (with one to three malloced areas).

Well, it's only one object to mark but it's three objects to malloc()
and three objects to free(). I suspect that both malloc() and free()
take a lot more time than marking does. That's an implementation detail,
too, though. In fact, most of this is.

No.  It's a conceptual difference.  The ... "hack" is designed
to not allocate heap objects, the tuple solution is designed
to allocate a new object.

Whether an object is constructed on the heap or the stack is not
a conceptual difference, it is an implementation detail.

Afaics, only the syntax gives anything new.  The tuples themself
give nothing new.  They are only a subset of tables.

I suppose that is true. I guess I like the syntax :)

tuples differ in two ways from tables: one is that they are immutable
(in my proposal) which, in an imaginable implementation, would make them
internable and therefore quite different from tables. (Or, even if not
interned, compared by value and not by object identity.) The other
difference is that there size is fixed and known.

I did.  But it's not the tuple that makes it possible to collect
multiple return value but the new syntax.

Fair enough. One could equally well propose the syntax:

  a, b, {c} = f()

I would have no objections to that, really.

And I wouldn't call it a "considerable complication to the Lua core".
I can't speak about Lua5 but I added them to Sol (Lua4 based) and
it's not that hard (I simply move the varargs to the up-to-now
unused upper end of the stack).

That seems like a reasonable implementation technique.