• Subject: AW: Thoughts on {...} and tbl[nil]
• Date: Fri, 8 Jun 2018 07:48:00 +0200

> Von: lua-l-bounces@lists.lua.org <lua-l-bounces@lists.lua.org> Im Auftrag von Egor Skriptunoff
> Gesendet: Donnerstag, 7. Juni 2018 07:35
> An: Lua mailing list <lua-l@lists.lua.org>
> Betreff: Re: Thoughts on {...} and tbl[nil]
>
> I think Rodrigo is correct.
> We don't really need NIL_IN_DICTIONARY, we only need NIL_IN_ARRAY.
> And the "array border" (or "high water mark") is the right word to solve the problem.
>
> Here is one more (very similar) suggestion for 5.4 which don't require new syntax.
> Every Lua table must store 8-byte integer "high water mark" (biggest non-zero positive integer key assigned).
> User can read it with function table.hwm(t)
>
> local t = {nil, nil}
> assert(table.hwm(t) == 2)
> t[math.pi] = 42
> assert(table.hwm(t) == 2)
> t[5] = 42
> assert(table.hwm(t) == 5)
> t[5] = nil
> assert(table.hwm(t) == 5)
> t[100] = nil
> assert(table.hwm(t) == 100)
>
> As you see, the logic of hwm(t) is completely independent of #t.
> We have both traditional length #t and array-ish length hwm(t) simultaneously,
> that solves our problem with nils in arrays.
>
> How arrays with nils should be traversed in 5.4:
> for k = 1, table.hwm(array) do ... end

I'd suggest to make setting the hwm a more explicit operation because otherwise
you're missing on a (IMO) crucial optimization, i.e. the fixed array setup in the
array part of the table.

You can't create a fitting linear array part automatically on every bigger
positive integer assignment as there for sure are many tables with very few
entries but at least one huge integer key.

Also, without this optimization, iteration over the arrays will probably become
by far the slowest way to iterate a Lua table as you need a hashed lookup for
each operation.

If you don't make it an explizit operation you also miss on the opportunity to
generate the array part in a size that can fit the required size optimally,
enabling a much much better memory usage pattern in code that uses lots of arrays
and has enough knowledge to actually know the required sizes (which should actually
be a lot and there will be even more devs thinking about their data structs a bit
more, if there's some real memory/performance to be gained).

Without doing/enable these additional optimization somehow, all you'd get is the
advantage of not having to store the current hwm value yourself and you're left
with a regular hashed table and an iteration over a defined size while doing a hash
lookup for every single iteration, whether there is an entry at the current index
or not. Your even losing the existing optimization of ipairs or simple manual
integer iteration.