lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Fri Aug  1 12:38:15 2014
siffiejoe at gmx.net (Philipp Janda) wrote:

> Am 01.08.2014 13:31 schr?bte Luiz Henrique de Figueiredo:
> >> Is there a reason why the protocol for `ipairs` has changed from
> >> "iterate to first nil value" to "iterate to #table"? The first is well
> >> defined for all tables, and runs in O(n), while the second needs
> >> O(nlog(n)) time
> >
> > It's O(n+log(n))=O(n) time.
> >
> 
> Isn't `ipairsaux` (which in turn calls `luaL_len`) called for every 
> iteration?

Hello,

I have recently been writing a JSON parser[1] to be used by our Lua web
framework[2].

Since JSON arrays may contain null values, and since one of the
library's goals is to represent JSON null values as nil, dealing with
sparse arrays is a necessity to us.

My idea was to use the __len and __ipairs metamethods of Lua 5.2 to
implement those sparse arrays in such way that the length operator will
return the length of the JSON array (including its null values), and
that ipairs will iterate through the whole JSON array (also including
its null values represented as nil):

for i, v in ipairs(json.import('[1,2,null,4]')) do print(v) end
-- prints:
-- 1
-- 2
-- nil
-- 4

To me, the documentation of Lua 5.2 doesn't seem to indicate whether
this approach is semantically correct (i.e. whether iteration using
ipairs may return nil values), but it looked reasonable to me to
proceed as explained above. I also believe that sparse arrays (where
you want to represent holes as nil) are not totally uncommon.

Now I noticed that Lua 5.3 aims to remove the __ipairs metamethod and
instead let ipairs(...) show customizable behavior by respecting the
__len and __index metamethods.

According to the current(!) documentation of Lua 5.3.0 alpha, ipairs
behaves as follows:

========================
ipairs(t)

Returns three values (an iterator function, the table t, and 0) so that the construction

for i,v in ipairs(t) do body end
will iterate over the pairs (1,t[1]), (2,t[2]), ..., (#t,t[#t]).

The table should be a proper sequence or have a __len metamethod (see §3.4.7).
========================

In my case, this behavior is quite handy and reduces the code of my
library. It comes at a cost though, as previously pointed out by
Philipp Janda (see above). If we look at the C code of Lua 5.3.0 alpha,
we see:

> static int luaB_ipairs (lua_State *L) {
>   lua_CFunction iter =
>      (luaL_getmetafield(L, 1, "__len") ||
>       luaL_getmetafield(L, 1, "__index"))
>         ? ipairsaux : ipairsaux_raw;
>   lua_pushcfunction(L, iter);  /* iteration function */
>   lua_pushvalue(L, 1);  /* state */
>   lua_pushinteger(L, 0);  /* initial value */
>   return 3;
> }

(assuming LUA_COMPAT_IPAIRS is undefined)

Thus, if a table has either a __len or an __index metamethod, then a
special iterator function (the internal C function ipairsaux) is
returned as first return value of the iterator triplet:

> static int ipairsaux (lua_State *L) {
>   int i = luaL_checkint(L, 2) + 1;
>   if (i > luaL_len(L, 1)) {  /* larger than length? */
>     lua_pushnil(L);  /* end traversal */
>     return 1;
>   }
>   else {
>     lua_pushinteger(L, i);
>     lua_pushinteger(L, i);  /* key for indexing table */
>     lua_gettable(L, 1);
>     return 2;
>   }
> }

In order to determine the break condition, ipairsaux will call luaL_len
(and thus possibly the __len metamethod) for every iteration step. This
seems unnecessary overhead to me, and if we assume that the
determination of the length requires in O(log(n)) time, the total
overhead would be of O(n*log(n)), as previously pointed out by Philipp
Janda (see above).

By principle, however, it is not possible to reduce this overhead,
because the way the for-loop works in Lua, we may only pass one Lua
value (the second value of the triplet) as state. Unless luaB_ipairs
creates either a closure or a table that contains the length
information (and thus the termination point of the iteration), we
cannot remember the length and will have to redetermine it during every
iteration step. Creating tables or closures, however, would have an
even greater perfomance impact.

This may be an argument to modify ipairs in such way to always let it
abort at the first nil value it encounters (even when metamethods are
used). But I would strongly dislike such behavior: That way, sparse
arrays could not be implemented that support the ipairs interface.

The behavior defined by the Lua 5.3.0 documentation, where the
iteration runs from 1 to #t) looks acceptable to me, even though its
implementation seems - by principle (as explained above) - inefficient
to me. This is why I would like to discourage from dropping the
__ipairs metamethod. Keeping the __ipairs metamethod would allow for

a) full customization of the behavior of ipairs iteration

 while

b) avoiding extra overhead (length determination) for every
   iteration step in case of customized behavior.


I think it may be helpful to define the semantics of ipairs(t) before
deciding on its behavior. For me, the following questions are of
particular interest:

1) Is it semantically correct to provide metamethods resulting in
   "for i,v in ipairs(t)" iterating over nil values, or should I use a
   different interface for sparse array iteration?

2) Will Lua 5.3 keep supporting metamethod-based customization of
   ipairs in such way that it can iterate over nil values? (yet
   supported by Lua 5.3 alpha, despite the issues explained above)


Kind Regards
Jan Behrens


[1] http://www.public-software-group.org/json_for_lua
[2] http://www.public-software-group.org/webmcp