lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, 14 Aug 2014 16:37:40 -0300
Roberto Ierusalimschy <roberto@inf.puc-rio.br> wrote:

> > [...]
> > 
> > 3. Removing the __ipairs metamethod mechanism is not just about
> >    the overhead (and speed) of the # operator:
> > 
> >    a) Beside the repetitive call of luaB_ipairs, it is not
> >       possible to return a precalculated value as second item of
> >       the iterator triplet that may speed things up.
> > 
> >    [...]
> 
> First, many thanks for the data. We will work over it.
> 
> Second, your point 3 is very elucidative, and brings me another
> question: you are doing something quite different from the original
> ipairs. Is there any special reason why you have to use ipairs for your
> iteration? The whole idea of the for loop in Lua is that people can
> build their own iterators. Is there a specific reason why yours has to
> be packed inside ipairs?
> 
> -- Roberto
> 

What exactly do you mean with this:
> you are doing something quite different from the original ipairs.

When I read about the plan to remove the __ipairs metamethod in
Lua 5.3, I got doubts whether my approach to use ipairs(...) for
iteration is semantically correct. In other words: Should I be using the
ipairs mechanism at all? (In Lua 5.2 it works fine in practice, but I'm
talking about semantics here.)

I came up with the question on semantics already in my initial post:

On Wed, 13 Aug 2014 22:33:23 +0200
Jan Behrens <jbe-lua-l@public-software-group.org> wrote:

> I think it may be helpful to define the semantics of ipairs(t) before
> deciding on its behavior. [...]

In Lua 5.1, where #t always returned the raw length of a table, I saw
"for i, v in ipairs(t) do ... end" as an alternative syntax for
"for i = 1, #t do local v = t[i]; ... end". This alternative syntax was
not part of the language (no language construct) but part of its base
library. Nevertheless, it fulfilled the same task as a language
construct would do, while keeping the language itself as tiny and easily
understandable as possible. Regarding raw tables, it was no drawback to
implement the ipairs construct as part of the baselib.

(Correct me if you disagree, this is just my perception of the
underlaying design principles.)

The behavior was not coherent regarding userdata values, though:
In Lua 5.1, userdata values may have a length (through __len) as well as
values associated with numerical indicies (via __index), but it was not
possible to iterate through them using ipairs(...).

Lua 5.2 fixed this incoherency by introducing the __ipairs metamethod,
that allows userdata values to act like sequences (not just regarding
the # operator but also in regard of ipairs).

Additionally, the __len metamethod is also respected by tables since
Lua 5.2, making it possible for either a proxy table or a userdata
value to completely behave like a sequence to the outside world.

Now, when I try to answer my question on the semantics of ipairs(...)
(see my quote above), I would say that (since Lua 5.2) ipairs is some
sort of interface: it allows programs to iterate through "sequence-like"
containers. In other words:  If your processing function utilizes the
ipairs iterator, then it doesn't matter if you pass a sequence in form
of a table or in form of a userdata: both userdata values, raw tables,
or proxy tables behave all the same when accessed through ipairs. Let
me give you an example:

function print_entries(t)
  for i, v in ipairs(t) do
    print("Entry #" .. tostring(i) .. ": " .. tostring(v))
  end
end

I may pass a raw table to print_entries, e.g.:

print_entries({"Hello", "World"})

But I may also pass some userdata value to print_entries, e.g.:

print_entries(sql:query("SELECT name FROM person"))
-- where sql:query(...) returns a userdata value

The function print_entries behaves polymorphic. It accepts any value
that supports the ipairs interface (through its __ipairs metamethod).
The print_entries function does not need to be aware of a special
iterator function that's part of some SQL library (or another iterator
function that's part of an an LDAP library, or yet another function
that's part of a JSON library, etc.). Instead it just uses the common
ipairs interface.

That's great, isn't it?


Now let me get back to your question, Roberto:

I use ipairs/__ipairs for two reasons:

* To allow json.array(some_sequence_like_container) accept any value
  that behaves like a sequence. Internally, json.array(...) looks for a
  __ipairs metamethod to iterate the argument passed to json.array(...).
  I could also use __len and numeric key access using lua_pushinteger
  and lua_gettable, I guess.

* To let return values of json.import(str) or json.array(...) behave
  like a sequence. Regarding my example above, I may write:
  print_entries(json.import('["Hello", "World"]'))

If json.import returned some data structure that doesn't support the
ipairs iterator, then print_entries(json.import('["Hello", "World"]'))
wouldn't work.

For the source code of the JSON library refer to:
http://www.public-software-group.org/mercurial/webmcp/file/tip/libraries/json/json.c


I like the ability to customize the behavior of ipairs. However, if we
assume that 

* for i, v in ipairs(t) do ... end

always should act like

* for i = 1, #t do local v = t[i] do ... end

then having to define the __ipairs metamethod appears as overhead.
Defining __len and __index is sufficient to describe a sequence-like
behavior of a proxy table or userdata value.

I can therefore understand the initial motivation to remove the
__ipairs metamethod. As I pointed out earlier, there are some
(theoretical and practical) performance issues though:

On Wed, 13 Aug 2014 22:33:23 +0200
Jan Behrens <jbe-lua-l@public-software-group.org> wrote:

> By principle, however, it is not possible to reduce this overhead,
> because the way the for-loop works in Lua, we may only pass one Lua
> value (the second value of the triplet) as state. Unless luaB_ipairs
> creates either a closure or a table that contains the length
> information (and thus the termination point of the iteration), we
> cannot remember the length and will have to redetermine it during every
> iteration step. Creating tables or closures, however, would have an
> even greater perfomance impact.

This is due to the fact that ipairs is a baselib function that utilizes
the iterator triplet construct (which in turn allows us to store only
one state variable, while we would need two state variables here). If
ipairs was a language construct that just behaves like "for i = 1, #t
do local v = t[i]; ... end", then we wouldn't have a problem here.

Sorry to bring this up... but... consequently, maybe the global ipairs
function should be completely removed and replaced by a language
construct in the long term?

Consider:

for v = t[i] do
  print(i, v)
end

A possible counter-argument, however, has already been given above:

> On Thu, 14 Aug 2014 13:09:10 +0200
> Jan Behrens <jbe-lua-l@public-software-group.org> wrote:
> 
> > 3. Removing the __ipairs metamethod mechanism is not just about
> >    the overhead (and speed) of the # operator:
> > 
> >    a) Beside the repetitive call of luaB_ipairs, it is not
> >       possible to return a precalculated value as second item of
> >       the iterator triplet that may speed things up.

Another counter-argument might be that the ipairs interface could also
be used for sequence-like containers whose size is undetermined at the
start of the iteration process (e.g. a database cursor).


Kind Regards
Jan Behrens