Re: A tricky way to determine Lua version (updated)

This is not a "collision" in the hashed part but still a collision as only one item will be accessed. The hashed part should not contain any key whose value is an integer in the sequential range of the array part, otherwise Lua would not know where to lookup the value. May be in the first stage if you write {[1] = true, [2] = true, [3] = true} it creates an empty array part but a 4-slots hash part (including one free slot) for the 3 hashed integers.

If you use {true, [2] = true, [3] = true}, there's one key in the array part (for the sequential range 1 to 1) and a 2-slots hash part (full) or larger 4-slots hash part (if ever the integer indexes [2], and [3] are mapped to the same hash value and there's a tradeoff to avoid using hashed parts that are 100% full or near 100%, where collision chains are very likely. In that case there's still no collision between the sequential range part and the hashed part that automatically excludes all keys that are in the sequential part.

The exact tuning however for the hashed part (the maximum fill level) is left to be implementation dependant.

As well, which part to the table to use (sequential or hashed) is to be left implementation dependant: a compiler may still decide to allocate more keys in the sequential range in order to reduce the fill level of the hashed part and reduce the level of collision chains in that last part. This also saves memory because the transfered integer keys that can grow the seize of the sequential array part will no longer have to be stored separately in hash nodes, and the hashed part could be compressed better or the unused nodes may be already free for other keys: this compression could occur for example at end of evaluation of the table constructor, when all keys are known.

I was however not concerned much about that tuning, but about the evaluation order: I spoke about collision of keys meaining that the same key value [1] (independantyly of its location in the array part of the hashed part) is reassigned with distinct values and the evaluation order (of key expressions and value expressions, and assignment of keys in the table should be in a well defined order, so that we can know which value will be returned. This is still unspecified, and independant of the internal tuning of the table (between what should be in the sequential range or what should be in the hashed part).

Apparently the sequential range part (array) is only used by table constructors for keys that have no explicit expressions but incremented automatically. And still what is the result of t={'a', [1]='b'} if we query t[1]? 'a' or 'b'? Lua can only store one of these two values, never both simultaneously, otherwise the result would be desastrous (imagine what would happens it you set t[1] = nil, and query again t[1] and sees that the result is not nil but the second value that was left in the other part when only one part had its value cleared...). Clearly the Lua engine has to decide each time which part to use, and it will normally try to use the sequential part first (because it is faster) before hashing the index and using the hashed part.

Then if the hashed part is full (or its collision chains have exceeded a maximum length or the fill level exceeds some threshold like 85%), Lua has to make a choice: it can either

- grow the sequential range of the array and transfer some integer keys already present in the hashed part into a larger range (this does not require any rehashing, just a single realloc for the larger sequential range where there can be a resonnable number of keys with non nil values transfered as long as this part remains at least about 50% used with non-nil values),

- or grow the hash part (which is costlier in memory and requires full rehashing).

As well when many keys are cleared in the table (either in the array part or the hashed part, so that this part falls below some fill level (e.g. below 25%) it can decide to reduce this part, or redistribute some keys from one part to the other to maintain the minimum fill level, by transfering few key/values pairs: one part may grow in size while the other part may be reduced, but growing the size of the sequential array is much less costly than the reverse. A tuning decision has to be made in the implementation, but this should not change the behavior and the resulting values left in the table after initializing it with a table constructors or changing it later with assignments of invidual keys.

So I don't see why t={a', [1]='b'} would return different value 'a' or 'b' for t[1], independantly of the implementation and why using t={[1]='a', [1]='b'} would make any difference: this should never have any "visible" effect on queried values (beside the effect on performance and memory cost), because the key values are still exactly the same: the order of evaluation for keys or values or the order on internal assignments (possibly colliding on one or the other part) should be the only thing that matters.

Le mer. 22 juil. 2020 à 14:05, Dmitry Meyer <me@undef.im> a écrit :

> if ({false, [1] = true})[1] then
> return 'LuaJIT'
>
> I'm not sure that the evaluation of the table constructor is documented
> correctly. Here you assign two different boolean values to the same
> index [1], this creates a collision and the evaluation order matters.
>

As I understand, there is no collision.

{item1, ...} stores items in the array part
{[idx1] = item1, ...} _always_ stores items in the hash part (even if
you use integer keys without holes)

From "Lua performance tips":

> If you write something like{[1] = true, [2] = true, [3] = true}, however, Lua is not smart enough to detect that the given expressions (literal numbers, in this case) describe array indices, so it creates a table with four slots in its hash part, wasting memory and CPU time

$ luac -p -l - <<< 'local t = ({false, [1] = true})'

main <stdin:0,0> (7 instructions at 0x565552f68ae0)
0+ params, 2 slots, 1 upvalue, 1 local, 1 constant, 0 functions
1 [1] PREPARATORY 0
2 [1] NEWTABLE 0 1 1 ; 1
3 [1] EXTRAARG 0
4 [1] LOADFALSE 1
5 [1] SETI 0 1 0k ; true
6 [1] SETLIST 0 1 0
7 [1] RETURN 1 1 1 ; 0 out