[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Why is nil not a valid table key?
- From: Rici Lake <lua@...>
- Date: Sat, 26 Nov 2005 11:29:51 -0500
On 26-Nov-05, at 10:29 AM, Mike Pall wrote:
Lua signals the end of a table traversal with a nil key. This is
both simple and efficient.
I'm not convinced of that. In fact, I think that in retrospect, the Lua
iteration protocol is suboptimal, and that the extra "flag" is closer.
Consider the alternative of using a hidden state object. This is not a
major change to the iteration protocol; it remains quite simple:
Currently we have:
for <varlist> in func, obj, state do <block> end
==>
do
local _f, _o, _s = func, obj, state
while true do
local <varlist> = _f(_o, _s)
_s = <varlist>
if _s == nil then break end
<body>
end
end
Contrast that with a very minor change:
do
local _f, _o, _s = func, obj, state
while true do
local <varlist>
_s, <varlist> = _f(_o, _s)
if _s == nil then break end
<body>
end
end
The only difference is that the iteration state is now hidden. To
support that style, the function returned by pairs(t) would have to
return something like: <key, key, value>, but that's not an enormous
change; in effect it moves a stack copy from the iteration loop to the
iterator.
Where you see a difference is in iterators which have a natural state
which is not useful or possibly even is confusing to reveal. The
simplest example is gfind, but there are many others. In fact, it would
seem that the vast majority of non-trivial iterators are of this form.
Specifically, gfind needs to maintain the current string offset, which
is the end of the current match, and consequently would be confusing
(and probably not generally useful) to reveal to the iteration loop. In
order to properly hide this state, gfind needs to create a closure and
provide that as an iterator. Thus, every use of gfind involves the
allocation of a closure.
If the for statement expanded as in the proposal above, this would not
be the case; one could write a gfind which always returned a constant
closure object (i.e. a simple function) and used the hidden state to
maintain the only piece of information necessary to continue the
iteration.
Here's another example: it's pretty straightforward to implement a
table-like object which maintains insertion order. The simplest
solution is to simply maintain two tables: the "real" table, which maps
keys onto values, and an insertion order table which maps integers onto
keys. Now the iteration state is actually the current index in the
insertion order table, but we want the iterator to return <key, value>
in order to be compatible with normal iterations. Consequently, we have
two alternatives: either create a closure in order to retain the
current iteration state (which is, remember, just an integer), or
maintain a third table with each object which maps keys onto integers.
In either case, there is quite a lot of overhead, and I would say that
the overhead is a consequence of the iteration protocol. Using the
proposal above, the state could be safely passed between calls to the
iterator and ordered_pairs() could return a constant function, without
adding any overhead to the underlying object.
If you look at the actual implementation of next(), you'll see that it
is actually using an integer as the current iteration state; in order
to avoid creating a specific closure for each iterated table, next()
recomputes this integer on each loop, which doubles the number of hash
lookups necessary to do a table iteration. This is not an enormous
overhead, but it is an overhead; again, it is created by the nature of
the iteration protocol; the "hidden state" iteration protocol would
entirely avoid it.
So, in summary, the "hidden state" iteration protocol is just as simple
as the existing iteration protocol; it is more efficient; and it does
not require the reservation of a value to indicate the end of the
iteration.