Re: Thoughts on {...} and tbl[nil]

On 06/07/2018 08:34 AM, Egor Skriptunoff wrote:
>
> > I think Rodrigo is correct.
> > We don't really need NIL_IN_DICTIONARY, we only need NIL_IN_ARRAY.
> > And the "array border" (or "high water mark") is the right word to solve
> > the problem.
> >
> > Here is one more (very similar) suggestion for 5.4 which don't require new
> > syntax.
> > Every Lua table must store 8-byte integer "high water mark" (biggest
> > non-zero positive integer key assigned).
> > User can read it with function table.hwm(t)
> >
> > [...]
> >
> > As you see, the logic of hwm(t) is completely independent of #t.
> > We have both traditional length #t and array-ish length hwm(t)
> > simultaneously,
> > that solves our problem with nils in arrays.
> >
> > How arrays with nils should be traversed in 5.4:
> > for k = 1, table.hwm(array) do ... end
> >
> > [...]
> >
> > -- Egor
>
> Wow, I like this proposal!
>
> I imagine this as a __newindex hook that sets hidden "hwm" field to a maximum
> of current "hwm" and integer key value. Small overhead, almost free.
>
> > Questions:
> > 1) Is table.sethwm(t, newhwm) needed?
> Yes. Stored such way water mark should not be immutable.
>
> > 2) Should ipairs() use table.hwm() internally? Is new function hpairs()
> > needed?
> I think no. Or it'll break existing Lua code. Introduce "hpairs()"
>
> > 3) Is __hwm metamethod needed?
> Probably yes. What parameters it'll have?
>
> -- Martin
>

Let's try an even simpler model: (alternative to a specific table constructor)

Definitions: t is a table

0) a 'sequence' is a continuous set of integer keys with non-nil values
1) #t (rawlen) operator: biggest non-zero positive integer key of the sequence starting from key 1 [1]
2) t# (rawborder) operator: biggest non-zero positive integer key assigned (rawset)

Examples:

t = {1,2,3,4,5,nil,nil,8,nil} -- two sequences
#t is 5
t# is 9

t = {nil,2,3,nil,nil} -- one sequence
#t is 0
t# is 5

Thus, as expected and compatible with current versions

1) a 'proper sequence' is a table where #t == t# is true .
2) pairs() iterates non-nils, as expected.
3) ipairs() iterates integer keys (non-nil values), as expected.
4) table.(un)pack() are now 'border symmetric' [2].
5) table.insert/remove() are only meanful for sequences starting from 1, which don't change.
6) 'numeric for' works as expected.

Remark:
1) tables remains memory/CPU efficient for sparse and dense objects.
2) t[t#+1] = nil, always increases the rawborder, as expected.

The objective problems Roberto is trying to solve [3]:
-------
1) A constructor like {x, y, z} should always create a sequence with
three elements. A constructor like {...} should always get all arguments
passed to a function. A constructor like {f(x)} should always get
all results returned by the function. '#' should always work on these
tables correctly.

2) A statement like 't[#t + 1] = x' should always add one more element
at the end of a sequence.

These are what confuse people all the time, these are what start new
rounds of discussions around '#'.
--------

It's all OK.

If you expect a 'proper sequence' use #t.
If 'nils' are important for you just use t# instead.
If you don't know, check #t == t# (but you usually know OK?)

It also separates the logic concerning nils_in_arrays of the current behavior, which is good and easily discernible into the code.

I think we can use the border t# to solve these problems and keep full compatibility with current versions at the cost of a new and simply operator.

Everybody happy?

[1] This is compatible with current length heuristics for 'sequences', namely, the only case where #t is useful for tables. It can be improved if we consider only the length of a sequence starting from 1 if it exists.
[2] Avoiding the '.n' wart.
[3] http://lua-users.org/lists/lua-l/2018-03/msg00239.html

--
Rodrigo Azevedo Moreira da Silva