Re: C API - lua_next traversal of "array" table

Well, I looked into this deeper after everyone gave me the conventional wisdom to NOT depend on `lua_next`'s iteration order and pray for no breaking changes.

It turns out that the correct wisdom -- iterate using an index value -- seems to also be the most performant wisdom from the C++ or C side. [1] compares the performance of calling `lua_next()` in a while loop and assuming it iterates in the right order, pushing a table and calling `lua_geti` with the indices (and knowing the size and indices before hand), and the other measures the doing a lua_pushinteger + lua_gettable. Measured in a computer with an i7 Intel chip, lots of ram, Windows 10, Lua 5.3.3 compiled as a DLL.

Interestingly, the pushinteger + gettable method is about equal to the lua_next implementation. So, for Lua 5.2 and lower, since there's no lua_geti intrinsically built into the API, the correct way is equal in performance to the UB way.

lua_geti was introduced in Lua 5.3 I believe, and it correctly adds a very heavy performance gain for the lookup (sorry LuaJIT and Lua 5.2 users, you're not gonna get the benefits here).

My only problem is: how do I know how big the array part is? I could use `#t` or fight with its equivalent in the Lua 5.1/2/3 code, which is either lua_objlen or lua_rawlength (I can't remember which). Right now I have a fixed size at compile-time... I could just keep iterating integer keys from 1 to infinity until I hit a snag, but that seems less than useful (and doesn't allow me to pre-allocate the size to save performance).

Any suggestions for getting the length, or is lua_objlen/rawlength (or whatever it is for each version of Lua) good enough?

[1] - https://dl.dropboxusercontent.com/u/17644642/lua%20iteration.html

On Thu, Aug 18, 2016 at 2:41 PM, nobody <nobody+lua-list@afra-berlin.de> wrote:

On 2016-08-18 19:20, Ross Berteig wrote:
> Honestly, the cost of writing a loop over integers from 1 to #t is so
> low (even in C) that you should just do that rather than assume that an
> undocumented behavior will work under all conditions.

That reminds me of the "dual" situation that I once found myself in:

Question: How do you iterate just the "non-array part" (i.e. all entries
excluding those with integer keys 1,2,... up to the first 'nil')?
(Assume that the "array part" contains millions of values (already
traversed in order by numeric for loop), and the "non-array part"
contains optional metadata, i.e. 0 or just a few values, that you now
want to inspect.)

Answer: You don't. You use two tables, one for the array-like part,
one for the metadata, and place one inside the other at a "well-known"
key (i.e. one that you can get without calling `next`). The 56 byte
cost of the extra table is negligible. (If the array part may be
smaller so that the memory matters, put the metadata in an optional
field in the array and check whether it exists or is nil.)

Non-Answers:

* Trying to jump over part of the next()-traversal by swapping the
current key passed to next from 1 to #t. Usually works, but can
break horribly, for the same reasons previously mentioned in this
thread.

* Making the keys well-known by arithmetic coding tricks, e.g. store
t[k]=v as t[i+0.25] = k, t[i+0.75] = v or just keep a list of keys
at t[i+0.5] = k and store t[k] = v normally, for i=1,2,...
Really cool but useless hack: costs 64 instead of 32 bytes per
entry (=costs more starting at 2 entries), needs own iterator, ...