lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

On Mon, Nov 16, 2009 at 09:39:11AM +0100, spir wrote:

> Someone said a thread on lua-list requires some benchmark. Some, I did test for this thread. I hope these tests are properly done, because I'm not used to that (please tell me in a PM if ever there are things to change in the testing method) ...
> I was wondering whether performance penalty would mainly come from closure creation at loop initialisation time, or instead that calling a "closured" func with upvalue in a loop would be significantly slower. So, I wrote 2 versions of simple iteration on an array-table with a fake (empty) loop body: one with direct call to iteration func, one with closure. Then measured time (in seconds) for the following processes:
> -1- Many (33,333,333) loops on empty table (meaning each loop initialisation requires closure creation for the closure version, but then in both cases loop process is a single func call + a single test):
> 	result: 10 / 33.
> My interpretation: closure creation is clostly. But the test itself does not mirror any meaningful use case.
> -2- Single loop on a table with many (33,333,333) items: 
> 	result: 17 / 19
> My interpretation: this test is meaningful for the actual use case of walking through big arrays; but note there is no job done at all in the loop body, the test actually only measures the looping mechanics: with some job in the loop, the difference will reduce in proportion. The overhead for using a closure on an empty loop has an order of magnitude of 10%. (As far as I'm concerned, it's already neglectable even with empty loop.)
> Then for further information: I wondered about the "weight" (in time) of actual iteration through items, compared to loop initialisation. So, I repeated the first case for arrays of 10 items:
> -3- Many (3,333,333) loops on 10-items table. (Beware outside loop count is /10.)
> 	result: 10 / 16.
> My interpretation: once multiplied per ten, we get 100 / 160. The first case is as expected, as loop init is nearly nothing, then we simply call the func 10 times more. For the case of a closure, we can evaluate each func call to ~ 13, and the loop init/closure creation to ~ 20 (for 33,333,333 times repeting the whole thing).
> As a last test, I added some actual operation in the loop body (copy of the table):
> -4- single loop on 11,111,111-items array + copy
> 	result: 14 / 15
> My interpretation: the times here are rather unstable (the other test where highly stable instable with only +/- 1 variation), varying with ~ 30% incertitude on my computer. Sometimes the closure version is even faster.
> Anyway, I would interpret this as meaning that, in the general case with non-empty arrays and some job in the loop body, the closure overhead is "very tiny".

The performance hit from using closures isn't going to come from time
spent creating the closure, that's pretty fast. The problem is that all
those closures use heap space, and if you create a lot of them, they're
going to need to be garbage collected later. If you only have one
loop---it doesn't matter how many iteration steps it involves---then
only one object is getting created and so the garbage collection costs
will be negligible. You're basically just testing here how much it slows
down a function to lookup its upvalues. And the answer is, hardly any

But if you have a lot of loops in your code and every loop creates a
closure (that might have otherwise been avoided, see below), then
there's going to be a lot more garbage that needs collecting and that's
where the performance costs will be.

Note that what matters is how many loops there are, not how many
iterations in a given loop.

In some cases the second and third argument to an iteration function
aren't enough and you have to keep track of more state. One way to do
this is by making the second argument a temporary table; another way is
by using a closure. Here you're going to face the performance effects of
having a new object that needs to be garbage collected either way. I
just go with whichever object would be smaller (a closure is smaller for
up to about 4 upvalues; after that the table is smaller).