Extending For And Next

lua-users home
wiki


[!] VersionNotice: The below code pertains to an older Lua version, Lua 4. It does not run as is under Lua 5.

An experimental patch to Lua 4.0

The Lua API makes it quite easy to integrate "foreign" objects, including foreign maps, but does not provide a facility to iterate over them. That is, by defining gettable and settable tag methods, it is feasible to completely integrate an indexed foreign object into the Lua environment, but this does not extend to the use of for k,v in foreign_map or, for that matter, k,v = next(foreign_map, k). Both of these constructions can be useful.

I was also intrigued by the possibility of extending next to "generator" functions. A generator function works exactly like the Lua next library function; given an iteration-state, it produces the next iteration-state and a value. To take a concrete example, I might define a generator as a closure on a character string, so that I could say:

keywords = words "do else elseif end for if in repeat unless while"
for i, v in keywords do
   dict[v] = (dict[v] or 0) + 1
end

which struck me as fairly elegant.

It turned out that the patch was quite simple, so I've posted it in the file area for those who might want to play with it.

The Files

Usage

The patch provides a new tag method, "next", which is invoked by the for k,v in obj do end construction; in the next library function; and in the lua_next API. Consistent with what seems to be Lua style, a new rawnext library function and lua_rawnext API are provided which avoid the use of the tag method. These work just as tables do, with the exception that they return and use "iteration states" rather than "keys". If the object also defined gettable and settable tag methods, it would probably be wise to arrange for a key and an iteration state to be the same thing; however, for a pure iteration object, the iteration state could be anything provided that nil indicates both the beginning and the end of the iteration. In fact, there is no need for it to be distinct on each iteration.

In addition, the patched constructions can iterate over a function rather than a table or an object with the "next" tag method; the function is called with a single argument, which is an iteration state, and must return two arguments, the first of which is the next iteration state and the second of which is the associated value. This is really only of cosmetic value in the case of next and lua_next but it avoids having to know whether the iteratee is functional or objective.

Examples

The words generator described above is defined as follows:

function words(str)
  return
    function(k)
      local _, k, v = strfind(%str, "([%w]+)", (k or 0) + 1)
      return k,v
    end
end

As can be seen, it retains its argument as a closure, and uses the index of the last character in each word as an iteration state.

Similarly, I could define

function wordsInFile(file)
  return
    function(k)
      k = read(%file, "*w")
      return k, k
    end
end

In this case, the iteration state is essentially meaningless during the iteration; it serves only to signal termination. Obviously, more sophisticated implementations are possible.

Filters and transformers

Filters and transformers take a generator and return another generator. A simple example of a transformer is

function toupper(gen)
  return
    function(k)
      local v
      k, v = next(%gen, k)
      if k then return k, strupper(v) end
    end
end

This can be generalised:

function gmap(fn, gen)
  return function(k)
    local v
    k, v = next(%gen, k)
    return k, k and %fn(v)
  end
end

Now I can write things like:

for i, v in toupper(words(str)) do
  -- something
end

A filter is rather like a transformer but is not one-to-one. A sample filter is similar to the Perl grep function:

-- given a predicate and a generator, generate only those
-- values for which the predicate returns true
function gfilter(fn, gen)
  return function(k)
    local v
    k, v = next(%gen, k)
    while k do
      if %fn(v) then return k, v end
      k, v = next(%gen, k)
    end
  end
end

I hope that gives a taste of the possibilities.

Implementation Notes

The majority of the patch is in lvm.c. Here I modify the opcodes LFORPREP and LFORLOOP to use the new vm function luaV_next, which implements the switch on ttype and tag method. next and lua_next are also modified to use this same function, producing (I hope) consistent results.

This does slow for loops down slightly, since the test is done every time through the loop. Timing tests on a P3/866 with 128MB RAM running FreeBSD 3.2 (my development machine) showed that a very basic for loop was approximately 10% slower with the patch. (On the other hand, next is about 4% faster.) I have a somewhat more complex patch which caches the tag method on the stack, resulting in a slowdown of only 7%, but I'm not yet happy enough with it to release it. I believe that the 4.1alpha architecture lends itself better to this, and I'm currently working my way through the 4.1alpha VM in an attempt to do it.

No changes are made to the parser or to the stack layout during a for loop, so the patched Lua ought to be binary-compatible; it only provides additional facilities. However, I wouldn't swear to that statement.

Future directions

I regard all of this as an interesting experiment. I think the sample code provided with the patch shows the expressive power of the new construct, quite apart from the ability to define iteratable userdata.

However, the confusion between iteration state and key continues to bother me. I note that in 4.1alpha, the list for is actually implemented with an iteration state and a key, which I think is more logical (and ought to be slightly faster); I'd be interested in the possibilities of extending the semantics of for loops and the next library function to allow for this. While it is often the case that a key can be used as an iteration-state, it is not always convenient (as the words example shows).

While keeping the iteration state hidden during a for loop is tempting, it reduces flexibility. The advantage of using a for loop and a generator rather than a traditional map function is primarily that it is possible to abandon an iteration or modify it. That is, it should (at least) be possible to use next inside a for loop in order to skip over elements. In some cases, it may also be possible to restart an iteration by assigning to the iteration-state variable.

Clearly, generators come in many categories and need to be adequately documented. They may be:

At the same time, the key->value model is a little rigid. In the case of a vector, the key may not be very interesting to me; in the case of a database, I might want to use a generator which returns more than one value. In theory, this could be accomodated by extending the syntax of for to something like for name_list in exp1 [, exp1] ... (the second exp1 would be a starting iteration state with nil as a default).

I welcome any thoughts; you can reach me at rlake(at)oxfam.org.pe. Or just put them here, or on the mailing list.

I suggest that you make the patch using options "-urN" and include the test programs in the patch. --JohnBelmonte


RecentChanges · preferences
edit · history
Last edited December 31, 2006 12:18 am GMT (diff)