On Thu, Jan 25, 2018 at 1:24 AM, dyngeccetor8 wrote:
your benchmark nicely exploits untypical case.
I believe the following approach is used by a lot of Lua programmers:
instead of creating two user API functions
only one universal function is created
insert(e1, e2, ...)
which has two independent branches of execution inside its body
("insert one" and "insert many").
It is convenient for users to have only one "insert" function in API.
What should we do in Lua 5.4 to avoid significant performance loss
when invoking it with just one argument?
The bright example of such function is Lua standard function math.max(...)
The most frequent use case is invoking it with just two arguments,
but it should work also with arbitrary number of arguments.
You'll unable to write it in pure Lua without significant performance loss
in Lua 5.4, and my benchmark actually shows that performance in Lua 5.4
will be about one third of Lua 5.3.
Splitting max() into two separate functions max2(a, b) and max_many(...)
And I have wonderful idea how to solve this problem.
Let's define "read" and "non-read" access to vararg.
"read" access is:
_ARG[k] (for any Lua value k)
accessing "three dots"
"non-read" access is everything except "read" access.
Examples of "non-read" access are:
saving the reference to _ARG anywhere, passing to a function, using in
an _expression_, writing into the table, etc.
_ARG + 1
The idea: At compile time (prior to building bytecode) a function body
is searched for "non-read" access to _ARG.
If "non-read" access is found anywhere inside the function body, then
this function is implemented in "5.4-style":
1) on function invocation, the table _ARG is created, varargs are stored
inside _ARG, varargs are not stored in the stack;
2) all access to _ARG is usual table access;
3) when "..." is accessed, _ARG is unpacked.
If "non-read" access is NOT found, function is implemented in "5.3-style":
1) no table is created on function invocation, vararg data is stored
in the stack (as in Lua 5.3).
2) "read" access to _ARG (that is, _ARG[k] and #_ARG) is emulated by using
specific bytecodes (EMULTABLE is generated instead of GETTABLE) for reading
data from the stack.
3) when "..." is accessed, it is copied from the stack (as in Lua 5.3).
So, if user only reads from _ARG (this is the most frequent use case),
we have no performance impact (as the table is not created),
and we have O(1) time complexity for _ARG[k] and #_ARG,
and all "push-pop" and "nested-map" vararg magic retains the performance.