# Simpler For Iterator

Possibly the current model of iterator "for" loops may be simplified. Here is both a trial to explain the present model and an introduction to a simpler alternative -- maybe. It seems the complication is due to:
• the difference between collection iterator and generator iterator,
• the mixing of interfaces to proper iterator and to iterator function.

```for A, data in iterator_func, X, Y do block end
```

Data is the actual data returned by the function and later used in the block. A, X, & Y are left to further explaination below. Here is a possible implementation of both a collection iterator and a generator iterator, based on the tutorial example (tried to be very explicit, and started at 1 for a change):

```-- collection iterator --
numbers = {1,3,5,7,9,11,13}
function coll_squares(coll)
local function next_square(coll, index)
if index > #coll then
return nil
end
n = coll[index]
return index+1, n*n
end
return next_square, coll, 1
end
for i, square in coll_squares(numbers) do print (square) end     --> OK

-- generator iterator --
function gen_squares(limit)
local function next_square(limit, number)
if number > limit then
return nil
end
return number+1, number*number
end
return next_square, limit, 1
end
for n, square in gen_squares(7) do print (square) end     --> OK
```

So, what are A, X, & Y? In case of a collection:

• A is the last index, received from and passed back to the func
• X is the collection
• Y is the start index, passed to the func at first call
In case of a generator:
• A is the last 'term' (the thing on which the generator function operates),
• X is the limit
• Y is the start term

It is difficult to find a common ground in order to explain and name A, X & Y meaningfully. X is called 's' in the reference manual, and 'state' in the tutorial. In the reference manual, A is called var1, while Y is called var. Here is a trial to make sense out of that:

• A is used to tell where we are in the sequence of iterations. May it be called "mark" (like in "landmark")?
• X defines the set of terms to be operated on. May it be called "range"?
• Y detemines the start term -- the first or the "zeroth" (depending on how the iterator function is written). Let's call it "start".
[If anyone finds better names...] In addition to their use in yielding next data, the mark and the range are also used together to know when to stop iterating. It is not trivial to guess what the iterator and the iterator func are supposed to return, as well what the func implicitely receives from lua, and the proper order of all these values.

The code above may be rewritten as follows:

```-- collection iterator --
function coll_squares(coll)
local index = 1
local coll = coll       -- just to make things clear
local function next_square()
if index > #coll then
return nil
end
n = coll[index]
index = index+1
return n*n
end
return next_square
end
for square in coll_squares(numbers) do print (square) end     -- OK

-- generator iterator --
function gen_squares(limit)
local number = 1
local limit = limit     -- ditto
local function next_square()
if number > limit then
return nil
end
n = number
number = number+1
return n*n
end
return next_square
end
for square in gen_squares(7) do print (square) end     -- OK
```

There are little differences which are all simplifications, except for the last one:

• The iterator only returns the iterator func.
• The func takes no parameter.
• The func only returns the actual data.
• A, X, Y are not used at all.
• The startup "mark" is set by the iterator.
The last point makes the mark (index or number) a local var in the iterator which is reachable to the nested func _closure_ as an upvalue (right?). The "range" can only be a local var in the iterator, so there is no need to pass it explicitely as an argument to the function. (please correct if anything is wrong here, including terminology)

We can imagine more complex cases, eg specifying the generator interval. Additional data becomes iterator parameters:

```-- generator iterator --
function gen_squares(start, stop, step)
local number = start
local function next_square()
if number > stop then
return nil
end
n = number
number = number+step
return n*n
end
return next_square
end
for square in gen_squares(3,9,2) do print (square) end     --> OK
```

Idem, if we complexify a collection iterator (here rather artificially):

```-- collection iterator --
require "math"
numbers = {1,3,5,7,9,11,13,15,17}
function coll_squares(coll, modulo)
local index = 1
local function number_filter()
-- return next number in coll multiple of modulo, else nil
while (index < #coll) do
number = coll[index]
if math.fmod(number, modulo) == 0 then
return number
end
index = index+1
end
return nil
end
local function next_square()
-- yield squares of multiples of modulo in coll
n = number_filter()
if not n then
return nil
end
index = index+1
return n*n
end
return next_square
end
for square in coll_squares(numbers, 3) do print (square) end     --> OK
```

In all cases, it seems A, X & Y are not needed. This way of implementing iterators makes a good use of lua basic features: funcs as values, nested funcs, closures/upvalues. So, a question is: can we simplify the interface between "for" syntax, iterator, and iterator func by getting rid of A, X & Y? If yes, a new syntax could be:

```for data in iterator_func do block end
```
While the present one is:
```for A, data in iterator_func, X, Y do block end
```

As a consequence, the variety of iterators would not be globally caught by the syntax itself, in a rather complicated manner, but let to the user implementation instead. It would sertainly be easier to learn & explain both the syntax and the proper way to write an iterator for a given task.

The reference manual states:

<< f, s, and var are invisible variables. The names are here for explanatory purposes only. >>
In the present proposal, they are inexistent. The necessary data is passed as parameters to the iterator, as is done now: collection, bounds or whatever.

(first page formulation by DeniSpir)

RecentChanges · preferences
edit · history
Last edited November 13, 2009 2:41 pm GMT (diff)