lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On 26-Nov-05, at 10:29 AM, Mike Pall wrote:

Lua signals the end of a table traversal with a nil key. This is
both simple and efficient.

I'm not convinced of that. In fact, I think that in retrospect, the Lua iteration protocol is suboptimal, and that the extra "flag" is closer.

Consider the alternative of using a hidden state object. This is not a major change to the iteration protocol; it remains quite simple:

Currently we have:

  for <varlist> in func, obj, state do <block> end

  ==>

  do
    local _f, _o, _s = func, obj, state
    while true do
      local <varlist> = _f(_o, _s)
      _s = <varlist>
      if _s == nil then break end
      <body>
    end
  end

Contrast that with a very minor change:

  do
    local _f, _o, _s = func, obj, state
    while true do
      local <varlist>
      _s, <varlist> = _f(_o, _s)
      if _s == nil then break end
      <body>
    end
  end

The only difference is that the iteration state is now hidden. To support that style, the function returned by pairs(t) would have to return something like: <key, key, value>, but that's not an enormous change; in effect it moves a stack copy from the iteration loop to the iterator.

Where you see a difference is in iterators which have a natural state which is not useful or possibly even is confusing to reveal. The simplest example is gfind, but there are many others. In fact, it would seem that the vast majority of non-trivial iterators are of this form.

Specifically, gfind needs to maintain the current string offset, which is the end of the current match, and consequently would be confusing (and probably not generally useful) to reveal to the iteration loop. In order to properly hide this state, gfind needs to create a closure and provide that as an iterator. Thus, every use of gfind involves the allocation of a closure.

If the for statement expanded as in the proposal above, this would not be the case; one could write a gfind which always returned a constant closure object (i.e. a simple function) and used the hidden state to maintain the only piece of information necessary to continue the iteration.

Here's another example: it's pretty straightforward to implement a table-like object which maintains insertion order. The simplest solution is to simply maintain two tables: the "real" table, which maps keys onto values, and an insertion order table which maps integers onto keys. Now the iteration state is actually the current index in the insertion order table, but we want the iterator to return <key, value> in order to be compatible with normal iterations. Consequently, we have two alternatives: either create a closure in order to retain the current iteration state (which is, remember, just an integer), or maintain a third table with each object which maps keys onto integers. In either case, there is quite a lot of overhead, and I would say that the overhead is a consequence of the iteration protocol. Using the proposal above, the state could be safely passed between calls to the iterator and ordered_pairs() could return a constant function, without adding any overhead to the underlying object.

If you look at the actual implementation of next(), you'll see that it is actually using an integer as the current iteration state; in order to avoid creating a specific closure for each iterated table, next() recomputes this integer on each loop, which doubles the number of hash lookups necessary to do a table iteration. This is not an enormous overhead, but it is an overhead; again, it is created by the nature of the iteration protocol; the "hidden state" iteration protocol would entirely avoid it.

So, in summary, the "hidden state" iteration protocol is just as simple as the existing iteration protocol; it is more efficient; and it does not require the reservation of a value to indicate the end of the iteration.