lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


[Warning: Long/rambly; TL;DR:  heavy use of varargs as the main
control flow construct is fine, acceptably fast, and very flexible.
Lua 5.4 changes all varargs to tables, but it's pretty much exactly as
fast as it was before – I'm happy with the situation so far.]

On 2018-01-09 02:55, 云风 Cloud Wu wrote:
Roberto Ierusalimschy <roberto@inf.puc-rio.br <mailto:roberto@inf.puc-rio.br>>于2018年1月8日周一 下午8:36写道:
We do not see vararg functions in critical paths often. Several
(most?) vararg functions have to create tables (or pay the price
of 'select') anyway.
Can this one avoid vararg create tables in lua 5.4?

function print_with_tag(tag)

return function(...) return print(tag, ...) end

end
Oh, right:  Thin wrapper functions that just pass on any number of
arguments without ever looking at them (often adding/removing stuff at
the front) are pretty common in my code.  A good example is

    function map( t, f, ... )
       local u = { }
       for k, v in pairs( t ) do  u[k] = f( v, ... )  end
       return setmetatable( u, getmetatable( t ) )
    end

This definition is convenient for chaining:  `map( t, map, f, ... )`
will apply `f` "two levels deep" in `t`.  Given some more functions of
this kind…

     -----

[No need to read all of this – it's just included for (near-)runnable
examples (data missing) – so skim & skip ahead!]

    -- nicer layout: write `go( x, f, g, h, … )` for `f( x, g, h, … )`
    function go( t, f, ... )  return f( t, ... )  end
    -- say `…, method, "foo", ...` if you need `x:foo( ... )`
    function method( x, mname, ... )  return x[mname]( x, ... )  end
    -- say `…, at, k, f, ...` if you need `f( x[k], ... )`
    function at( x, k, f, ... )  return f( x[k], ... )  end
    -- map without result / only traverse (faster if modifying in-place)
    function map_(t,f,...)  for k,v in pairs(t) do  f(v,...)  end  end
    -- map-then-reduce
    function rmap( t, red, ... )  return red( map( t, ... ) )  end
    -- fold a.k.a. reduce --> f( …f( f( z, v1 ), v2 )…, vN )
    function fold( t, z, f, ... )
       for k, v in pairs( t ) do  z = f( z, v, ... )  end
       return z
    end
    -- "generic" sum --> v1 + v2 + … + vN
    do
       local function add( z, v )  return z+v  end
       local ZERO = setmetatable({},{__add=function(_,v) return v end})
       function sum( t )  return fold( t, ZERO, add )  end
    end

…you can write really concise code (particularly data processing):

map_( files,  method,"write",  os.date "%FT%T", "\t", esc(msg), "\n" )

(Given a bunch of files, write to each a line containing the current
datetime / tab / some message `msg`.  No need for explicit caching –
same timestamp for all & not running the message escaping per file,
unlike for loops where you have to manually reorder things to get that.)

map_( states,  at,"log",  method,"write",  ... )

(Same, one level deeper: `state.log:write( ... )` foreach in states.)

stats = go( data,  map,  rmap,sum,  rmap,sum,  at,"samples",  summary )

(If `data[group][entity][n].samples[i]` is the `i`-th measurement in the
`n`th run of `entity` in `group`, get per-group stats, i.e. get
`stats[group] = { min/max/mean/avg/… }` by running `summary` on each
of the innermost lists of measurements, then combining over all runs and
entities, but keeping groups separate.  This may be rather hard to read
at first, but for me it sure beats a dozen lines or more of explicit
nested for loops and all that stuff…)

     -----

In essence, I'm using varags as some kind of a weird high/low level
"programmable instruction stream".  (And these examples don't really use
the "programmable" part yet… you have a raw token / instruction stream,
there's no need to parse anything or deal with trees/nesting, … – just
do whatever! ^,^)  This style of chaining transformations seems to
encourage higher levels of abstraction, because you can define or
generate the parts any way you like.  (You're only setting up the big
pipeline once – so closures etc. are cheap.  With normal functions /
closures, you can't poke at the innards and so you're forced to wrap a
wrapper around the wrapper around… or use string processing or other
forms of code generation.  Normal for loops also seem to have a bad
"gradient", discouraging higher-order abstractions (they need closures,
which get (re-)created on every iteration) and encouraging manual
machine-friendly/human-unfriendly code reordering ("let's just move this
up & put it in front of the loop").)
And although closures are cheap(er) in "vararg code streams", it turns
out that you rarely need them as you can just munch an arbitrary number
of immediates from the stream.  (And that makes it feel kinda like
_really_ low-level machine code at the same time.)

And while this is certainly a bit slower than a plain loop, the speed is
fine – the few cases where I bothered to write it out as explicit loops,
the worst was like 2x or 3x, usually less.


To see how 5.4 changes the performance, I ran a quick test, map^3-ing
over a 16x16x1024-list with a simple comparison…

    function eq( a, b )  return a == b  end
    for i = 1, 1024 do
       local t0 = os.clock()
       map( xsss, map, map, eq, i )
       local t1 = os.clock()
       log( _VERSION, i, t1-t0 )
    end

…(so effectively measuring pure traversal/copy overhead) produces mean
times of 0.0525s (5.3) and 0.0543s (git HEAD)… or roughly +3% / +0.002s,
which is pretty much a negligible difference.  A "raw" traversal
(explicit nested for loops, no funcalls) is faster (0.0297s in 5.3,
0.0204s git HEAD (Wow!!)), but that expands the single map(…)-line into
a whopping 12 lines of code, which just isn't worth it.

Returning to the question…

Can this one avoid vararg create tables in lua 5.4?

function print_with_tag(tag)

return function(...) return print(tag, ...) end

end
luaV_execute in lvm.c says, starting in line 1702:

      vmcase(OP_VARARG) {
        int n = GETARG_C(i) - 1;  /* required results */
        TValue *vtab = vRB(i);  /* vararg table */
        Protect(luaT_getvarargs(L, vtab, ra, n));
        vmbreak;
      }

so the table will always be created – but it seems it's pretty much
exactly as fast as the current approach.

-- nobody