Re: Lua 5.1+ and variable-argument tables

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Lua 5.1+ and variable-argument tables
From: Rici Lake <lua@...>
Date: Thu, 19 Aug 2004 22:11:34 -0500


On 19-Aug-04, at 8:47 PM, Edgar Toernig wrote:

IMHO that's the most important point.  If you accept heap allocation
during vararg-function invocation you can stick with tables.  The
slight performance advantage of your tuple implementation could
be nullified by optimizing the appropriate parts of table creation.

It's not that slight. You might be surprised. Here's a quickpseudo-benchmark;

OS is FreeBSD 5.2.1, compiled with -O2 -fomit-frame-pointer

The tuple() function is the C version the patch for which is in anearlier message.

I didn't hack the syntax in, so the test function calls tuple on thearguments;


-- curried function expects a tuple

> function tcurry(fn, a) return function(tup) return fn(a, tup()) endend

-- standard varargs

> function vcurry(fn, a) return function(...) return fn(a, unpack(arg))end end

-- fixed arguments as a baseline

> function fcurry(fn, a) return function(b, c, d) return fn(a, b, c, d)end end

-- simple test function with four arguments
> function f(a, b, c, d) return a, b, c, d end
-- Unfortunately, somewhat imprecise ansi-standard clock
> function time(f, n)

>> local now = os.clock() for i = 1, n do f() end return os.clock() -now

>> end

> return time(function() local g = fcurry(f, 1); return g(2, 3, 4) end,1e7)

18.421875

> return time(function() local g = tcurry(f, 1); return g(tuple(2, 3,4)) end, 1e7)

23.59375

> return time(function() local g = vcurry(f, 1); return g(2, 3, 4) end,1e7)

35.921875

So tupling is about 30% slower than the baseline, and varargs is abouttwice as slow.

That was promising enough to make it worthwhile testing for real: Ichanged adjust_varargs in ldo.c to the following: (this is 5.0.2)


static void adjust_varargs (lua_State *L, int nfixargs, StkId base) {
  int i;
  Closure *cl;
  int actual = L->top - base;  /* actual number of arguments */
  if (actual < nfixargs) {
    luaD_checkstack(L, nfixargs - actual);
    for (; actual < nfixargs; ++actual)
      setnilvalue(L->top++);
  }
  actual -= nfixargs;  /* number of extra arguments */
  cl = luaF_newCclosure(L, actual);  /* create `arg' table */
  cl->c.f = lua_pushupvalues;
  for (i=0; i<actual; i++)  /* put extra arguments into `arg' table */
    setobj2n(&cl->c.upvalue[i], L->top - actual + i);
  L->top -= actual;  /* remove extra elements from the stack */
  setclvalue(L->top, cl);
  incr_top(L);
}

which makes arg into a tuple instead of a table. With thatmodification, I got a tiny speed improvement (but the syntax isbetter):

> function tcurry(fn, a) return function(...) return fn(a, arg()) endend

> function f(a, b, c, d) return a, b, c, d end
> function time(f, n)

>> local now = os.clock() for i = 1, n do f() end return os.clock() -now

>> end

> return time(function() local g = tcurry(f, 1); return g(2, 3, 4) end,1e7)

23.2109375

In short, for about 6 lines of code changed, a 35% speedup. It's hardto imagine that the ... thing would be faster than the baseline, andpresumably the extra copying and stack management would take some time,so we could guess that the alternative is considerably more change tothe code base for a 45% speedup. And I still say that tuples have extrauses.

The currying is supposed to create a new object, isn't it?


Yes, but it might be a very short-lived object. Iterators and
mapping functions spring to mind.

Hmmm... nitpicking: from the GC point of view a table is one
object (with one to three malloced areas).


Well, it's only one object to mark but it's three objects to malloc()
and three objects to free(). I suspect that both malloc() and free()
take a lot more time than marking does. That's an implementation detail,
too, though. In fact, most of this is.

No.  It's a conceptual difference.  The ... "hack" is designed
to not allocate heap objects, the tuple solution is designed
to allocate a new object.


Whether an object is constructed on the heap or the stack is not
a conceptual difference, it is an implementation detail.

Afaics, only the syntax gives anything new.  The tuples themself
give nothing new.  They are only a subset of tables.


I suppose that is true. I guess I like the syntax :)

tuples differ in two ways from tables: one is that they are immutable
(in my proposal) which, in an imaginable implementation, would make them
internable and therefore quite different from tables. (Or, even if not
interned, compared by value and not by object identity.) The other
difference is that there size is fixed and known.

I did.  But it's not the tuple that makes it possible to collect
multiple return value but the new syntax.


Fair enough. One could equally well propose the syntax:

  a, b, {c} = f()

I would have no objections to that, really.

And I wouldn't call it a "considerable complication to the Lua core".
I can't speak about Lua5 but I added them to Sol (Lua4 based) and
it's not that hard (I simply move the varargs to the up-to-now
unused upper end of the stack).


That seems like a reasonable implementation technique.

Follow-Ups:
- Re: Lua 5.1+ and variable-argument tables, Mark Hamburg
- Re: Lua 5.1+ and variable-argument tables, Edgar Toernig

References:
- Re: Lua 5.1+ and variable-argument tables, Wim Couwenberg
- Re: Lua 5.1+ and variable-argument tables, Rici Lake
- Re: Lua 5.1+ and variable-argument tables, Edgar Toernig
- Re: Lua 5.1+ and variable-argument tables, Rici Lake
- Re: Lua 5.1+ and variable-argument tables, Edgar Toernig

Prev by Date: Re: Lua 5.1+ and variable-argument tables
Next by Date: Re: Lua 5.1+ and variable-argument tables
Previous by thread: Re: Lua 5.1+ and variable-argument tables
Next by thread: Re: Lua 5.1+ and variable-argument tables
Index(es):
- Date
- Thread