lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]




On Saturday, August 16, 2014, Jan Behrens <jbe-lua-l@public-software-group.org> wrote:
On Fri, 15 Aug 2014 10:50:04 -0700
Coda Highland <chighland@gmail.com> wrote:

> On Fri, Aug 15, 2014 at 10:47 AM,
> Doug Currie <doug.currie@gmail.com> wrote:
> >
> > On Fri, Aug 15, 2014 at 1:16 PM,
> > Jan Behrens <jbe-lua-l@public-software-group.org> wrote:
> > >
> > > On Fri, 15 Aug 2014 13:37:41 -0300
> > > Roberto Ierusalimschy <roberto@inf.puc-rio.br> wrote:
> > >
> > > > If the goal is only to reduce calls to length, another option would be
> > > > to modify ipairs so that it stops in the first nil entry with an index
> > > > larger than #t. That would have some nice properties:
> > >
> > > It sounds like a "hack" to me.
> >
> > It is very clever. I like it.
>
> I like it too!


Maybe I should explain what I meant with "hack":
it helps only if nil is an exception.

In all other cases it doesn't speed things up.


I couldn't resist to do a further benchmark for demonstration (even if
I think it's more important to clarify the semantics of ipairs first).

The benchmark implements a sparse array in Lua (Note: In C this
benchmark might produce different results).

============================================================
sparse-array.lua:

do
  local function sparse_ipairsaux(t, i)
    if i < t.len then
      i = i + 1
      local v = rawget(t, i)
      if v == nil then
        return i, t.default
      else
        return i, v
      end
    else
      return nil
    end
  end

  local sparse_metatable = {
    __index = function(t, k)
      if type(k) == "number" and k > 0 and k <= t.len then
        return t.default
      else
        return rawget(t, k)
      end
    end,
    __len = function(t) return t.len end,
    __ipairs = function(t)
      return sparse_ipairsaux, t, 0
    end
  }

  function sparse_array(entries, len, default)
    return setmetatable({
      entries = entries,
      len     = len,
      default = default
    }, sparse_metatable)
  end
end

do
  local field = sparse_array({}, 10000, nil)
  field[3]    = "a"
  field[2520] = "b"
  field[6667] = "c"

  local string_count = 0

  for i = 1, 1000 do
    for j, v in ipairs(field) do
      if type(v) == "string" then string_count = string_count + 1 end
    end
  end

  print(string_count .. " strings found")
end
============================================================


Running this program with different approaches, I get the following run-time:


Lua 5.3.0-alpha without LUA_COMPAT_5_2:
27.481 seconds

Lua 5.3.0-alpha with LUA_COMPAT_IPAIRS:
22.867 seconds

Lua 5.3.0-alpha with a cached length (see patch below):
24.720 seconds

Lua 5.3.0-alpha with Roberto's approach (see patch below) to only
determine the length when a nil is encountered:
28.181 seconds


Here, Roberto's approach would be slowest.

Note that the inner loop contains a function call type(v), so it's not
that surprising that the numbers are similar to each other.


Patch to simulate cached length:

--- lua-5.3.0-alpha/src/lbaselib.c      2014-07-24 21:33:29.000000000 +0200
+++ lua-5.3.0-alpha-staticlen/src/lbaselib.c    2014-08-15 17:17:32.078783582 +0200
@@ -260,7 +260,11 @@
 */
 static int ipairsaux (lua_State *L) {
   int i = luaL_checkint(L, 2) + 1;
-  if (i > luaL_len(L, 1)) {  /* larger than length? */
+  int maxi;
+  lua_settop(L, 2);
+  lua_pushinteger(L, 10000);
+  maxi = luaL_checkint(L, 3);
+  if (i > maxi) {  /* larger than length? */
     lua_pushnil(L);  /* end traversal */
     return 1;
   }


Patch to only check length when nil is encountered (Roberto's idea):

--- lua-5.3.0-alpha/src/lbaselib.c      2014-07-24 21:33:29.000000000 +0200
+++ lua-5.3.0-alpha-lenhack/src/lbaselib.c      2014-08-16 11:36:59.485253075 +0200
@@ -260,14 +260,12 @@
 */
 static int ipairsaux (lua_State *L) {
   int i = luaL_checkint(L, 2) + 1;
-  if (i > luaL_len(L, 1)) {  /* larger than length? */
-    lua_pushnil(L);  /* end traversal */
-    return 1;
-  }
-  else {
-    lua_pushinteger(L, i);
-    lua_pushinteger(L, i);  /* key for indexing table */
-    lua_gettable(L, 1);
+  lua_pushinteger(L, i);
+  lua_pushinteger(L, i);  /* key for indexing table */
+  lua_gettable(L, 1);
+  if (lua_isnil(L, -1) && i > luaL_len(L, 1)) {  /* check len only if nil */
+    return 1;  /* end traversal */
+  } else {
     return 2;
   }
 }


Output of my benchmark:

without LUA_COMPAT_5_2:
~/lua_ipairs_test/lua53 % time bin/lua ../sparse-array.lua
3000 strings found
27.481u 0.000s 0:28.26 97.2%    258+172k 0+0io 0pf+0w

with LUA_COMPAT_IPAIRS:
~/lua_ipairs_test/lua53compat % time bin/lua ../sparse-array.lua
3000 strings found
22.867u 0.000s 0:23.44 97.5%    258+172k 0+0io 0pf+0w

lua-5.3.0-alpha-staticlen:
~/lua_ipairs_test/lua53patched % time bin/lua ../sparse-array.lua
3000 strings found
24.720u 0.000s 0:25.17 98.2%    258+172k 0+0io 0pf+0w

lua-5.3.0-alpha-lenhack:
~/lua_ipairs_test/lua53lenhack % time bin/lua ../sparse-array.lua
3000 strings found
28.181u 0.000s 0:29.07 96.9%    258+172k 0+0io 0pf+0w


But as I previously said: more important than these benchmarks would be
(for me) to understand the semantics of ipairs first. In particular:

Should

* for i, v in ipairs(t) do ... end

always do the same as:

* for i = 1, #t do local v = t[i]; ... end

Or should __ipairs allow specific optimizations and/or iterations over
objects that might not even have a length (or where the length can't be
determined easily).


Regards
Jan


ipairs is the only way to get ordinal retrieval, as far as I can tell. That's why it's critical, unless I'm wrong about that.  Other than that, iterators are not one of those things where I need consistency across libraries. I can always write an iterator as part of the module. 

There are many times that I wish that I could quickly retrieve all string keys, without testing for numeric keys, but this can be worked around by changing the model. 

The ability to directly overload the standard iterator that returns numeric keys in order will be missed, but only a little. It's only a little odd to ask a user to use pairs in that case and I can always write a custom iterator. 

My $0.02

-Andrew