Sequence Adapters

lua-users home
wiki

A Need for Better Notation for Sequence Operations

Sequences are sequential streams of values that are produced by an iterator function. In the simplest form, they are represented by a function (or callable object) which returns a number of values each time it is called, with nil indicating the end. Sequences can be single-valued (like io.lines) or multi-valued (like pairs) and mostly they produce values of the same type. It is useful to extract some common operations we use with sequences and tables, and make them available as a library. This is one of the main goals behind the PenlightLibraries.

(Please note that this is an idiosyncratic definition: the Lua manual defines sequence as "a table where the set of all positive numeric keys is equal to {1..n} for some integer n" [1].)

With pl.seq together with the placeholder expressions provided by pl.func we can say:

seq.printall(seq.filter(seq.list{1,2,3,4},Gt(_1,2)))

Which is fair enough, but awkward; I personally would find it hard to motivate this rather than a simple loop. Of course, this is a toy example, but more complicated combinations become even harder for humans to parse. Sequence adapters make it possible to express the same operation as a method chain:

S{1,2,3,4}:filter(Gt(_1,2)):printall()

This is more readable, for several reasons. The first is that the qualifying seq. isn't everywhere, and the second is that we are culturally conditioned to read an operation sequence from left to right, unlike function application which is right to left. (This is also the famous pipeline metaphor made so popular by Unix.) Here is another example, where the length operator is mapped over a sequence of strings:

S{'one','tw','t'} :map '#' :printall()  --> output: 3 2 1

It is more interesting to use a real sequence, such as generated by io.lines. This creates a sequence of all unique lines in a file, and then copies it to a table.

ls = S(io.lines(fname)):unique():copy()

Another pattern which would be very convenient is to apply methods to all elements of the sequence:

S{'[one]','[two]','[three]'}:sub(2,-2):upper():printall() --> output: ONE TWO THREE

Here is a more involved example; given a Lua code string, find all the variable names. lexer.lua will create a double-valued sequence, with the first value the token type and the second the token value. The first value is used to filter the token stream so that only variables ('iden') are passed through; map(_2) only lets the values through, unique collects the names, and finally copy turns these into a table.

str = 'for i=1,10 do for j = 1,10 do print(i,j) end end'
ls = S(lexer.lua(str)):filter(Eq(_1,'iden')):map(_2):unique():copy()
print(List(ls))
---> output: {i,print,j}

The source is [here].

Implementation

Since there is no type that uniquely indentifies a sequence (apart from it being callable) we need a wrapper object. The iterator function is put inside a table and a metatable is attached so we can control method lookup.

The __index metamethod is called whenever an unknown method is called, and the method is then looked up in the seq table. However, this function can't be used directly, since (a) it is expecting a sequence as its first argument and (b) it must return its results in a sequence wrapper so we can keep the method chain going.

SMT = {
	__index = function (tbl,key)
		local s = seq[key]
		if s then
			return function(sw,...) return S(s(sw.iter,...)) end
		end
	end,
}

function S (iter)
	return setmetatable({iter=iter},SMT)
end

This demonstrates the concept, but in practice doesn't handle all the necessary cases. Some seq functions return plain values (like reduce) or tables (like copy), and wrapping these would be confusing. Functions like map have their arguments the wrong way around (function first, then sequence). It would be convenient that S can wrap a table as a sequence, but should not do so for the result of copy. And so forth. And there is still the issue of how to do methods upon the sequence values. The key is this function in seq:

function mapmethod (iter,name,arg1,arg2)
	local val = iter()
	local fn = val and val[name]
	return function()
		if not val then return end
		local res = fn(val,arg1,arg2)
		val = iter()
		return res
	end
end

Unlike plain map, mapmethod is given a function name, which it looks up in the first value generated by the sequence - thereafter returns a sequence which is the result of applying that function to every value. (As a convenience, the first two extra arguments are explicitly captured for the function call). So the strategy in the __index metamethod when it cannot look up a function name is to call mapmethod with the name of the method.


-- seqa.lua
local seq = require 'pl.seq'

-- can't look these directly up in seq because of the wrong argument order...
local overrides = {
	map = function(self,fun)
		return seq.map(fun,self)
	end,
	reduce = function(self,fun)
		return seq.reduce(fun,self)
	end
}

SMT = {
	__index = function (tbl,key)
		local s = overrides[key] or seq[key]
		if s then
			return function(sw,...) return SW(s(sw.iter,...)) end
		else
			return function(sw,...) return SW(seq.mapmethod(sw.iter,key,...)) end
		end
	end,
	__call = function (sw)
		return sw.iter()
	end,
}

function callable (v)
    return type(v) == 'function' or getmetatable(v) and getmetatable(v).__call
end

function S (iter)
	if not callable(iter) then
		if type(iter) == 'table' then iter = seq.list(iter)
		else return iter
		end
	end
	return setmetatable({iter=iter},SMT)
end

function SW (iter)
	if callable(iter) then
		return setmetatable({iter=iter},SMT)
	else
		return iter
	end
end

SteveDonovan


RecentChanges · preferences
edit · history
Last edited July 4, 2012 10:25 am GMT (diff)