lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Wed, Jun 1, 2011 at 8:18 AM, Dirk Laurie <dpl@sun.ac.za> wrote:
> Every now and then I reread Chapter 7 of PIL.  I tell myself: once
> I fully understand that, I'll consider myself a non-newbie.  I have
> not yet passed the test ...
>
> I have a question for non-newbies, at the end of this post.  It deals
> with the problem:
>    Write an iterator over the values of a list.
>
> Yes, I know I can do it with
>    for _,v in ipairs(a) do -- use v in here
>        end
> but I find myself forgetting that dummy variable too often.
>
> Yes, I know I can do it with
>    for i=1,#a do -- use v=a[i] in here
> but that's inconvenient when a is an expression.
>
> I can understand the logic of this Section 7.1 type iterator:
>    values = function(a) -- iterator over values of list 'a'
>        local i=0
>        return function() i=i+1; return a[i] end
>        end
>
> I find the rest of Chapter 7 hard going.  So here's the question to
> all you experts out there (it's the same question asked three times):
>    Is this simplistic but readable iterator so very inefficient?
>    Are situations in which the difference between this and one of the
>        more sophisticated iterators of Section 7.2 etc is crucial to
>        performance, so very common?
>    Am I wrong in thinking that if what I want to use the value in
>    a nontrivial way, it matters not one whit how I iterate for it?
>
> Dirk
>
>
>

The rest of Chapter 7 talks about so-called "stateless" iterators,
which does not help you create the kind of iterator you want to make.
You have already ruled them out - you say that you forget that "dummy
variable" of ipairs() too often, so you want to eliminate it. But that
first value is the key to making the stateless iterator work, because
it is also passed back to the iterator function on the next iteration,
letting it know which value to return next.

As to whether it really matters all that much whether you use a
stateful iterator, which you can easily understand, or a stateless
iterator, which is trickier: the key is that a stateful iterator must
allocate *some* memory, while a stateless iterator does not (although
of course, depending on how it is written, it still might do). As an
example, here is a test which turns the garbage collector off and sees
how much memory has been allocated after a million empty ipairs()
loops, vs. a million empty values() loops:


 -- stateful alternative to ipairs
 function values(t)
   local i = 0
   return function()
     i = i + 1
     return t[i]
   end
 end

 collectgarbage 'stop'

 mytable = {}

 print('baseline:', collectgarbage 'count')
 for _ = 1, 1e6 do
   for i, v in ipairs(mytable) do
     -- ...
   end
 end
 print('ipairs(): ', collectgarbage 'count')
 for _ = 1, 1e6 do
   for v in values(mytable) do
     -- ...
   end
 end
 print('values():', collectgarbage 'count')


For me, the baseline memory allocation is about 22 KB. After a million
ipairs() it is still about 22 KB, but after a million values() it goes
up to over 80 MB.

Of course, that's with the garbage collector turned off. But it
represents the amount of work that the garbage collector has to do,
and *that* is what can affect performance. Imagine if you had Lua
embedded in something where you want to run a small simple script many
times a second. If that script uses a lot of stateful iterator loops,
the amount of time that Lua has to spend in the garbage collector
cleaning up after them all starts to increase.

In summary, stateless iterators only work at all for certain simple
kinds of iteration anyway (for example, string.gmatch() returns a
stateful iterator, it just couldn't work as a stateless one) and the
difference in performance with stateful iterators is only likely to
come up if you are using them a *lot*, on the order of hundreds of
thousands of times.

-Duncan