  lua-l archive

• Subject: Re: Why no for table-iterator like e. g. "hashpairs"?
• From: Philippe Verdy <verdy_p@...>
• Date: Sat, 2 Nov 2019 00:39:06 +0100

#t in your case should only return 3 (i.e. all integer indexes between 1 and 3 are set to non-nil and index 4 is nil) not 440 (which would be correct if all indexes from 1 to 440 were not nil, but this is false here for index 4; all we know is that index 441 is not-nil, but it is not the highest index whose next is not nul, as you can see at ).

How the "#t" operator is computed is crazy when tables are filled in random order, or reverse order, even when all indexes are integers only...).
Ideally there can exist an optimization in the table implementation where an array (indexed by hashes) is used for collision lists of (key,value) pairs, and a second array is used to contain values directly without any collision (but possibly containing nil values): if the second direct array of values can contain nil values, then you save a lot when setting values in the table, without having t oscan the array constantly when one allocated position is set to nil (including the case where some slots may be preallocated to nil values to save the cost of reallocations): setting a non-nil value at a positive integer key higher than the last estimation would set increase an hidden "length-max" field, setting a nil value to a positive integer key value lower than the last estimation would decrease an hidden "length-min" field.

But the operator "#t" should then be able to scan from "lengh-min" to "length-max" to determine the first positive integer position with a non-nil value which is followed by nil-value: this is the only case where a table scan would be needed, and then it would set "length-min" and "lengh-max" to that determined position, and it would be very fast.

But this also means that the integer-index can then be sparse, and the standard hash table would be used for all other keys that are not positive integers or for integers that are too large without creating a large gap. The table implementation would determine when to move some integer keys from the hash table to the array, based on a simple "fill ratio" metric (the table can knows the current maximum size of the array indexed by integer, and how many positions have non-nil values, it can also count the number of collisions per slot in the hash table and when as long this is low, there's no need to extend the number of slots, but above some fill level for collision lists, the integers present in the hash table could be moved to the integer-ba

Le ven. 1 nov. 2019 à 16:27, bil til <flyer31@googlemail.com> a écrit :
... this is a nasty example, as then the table contains nil...

Could we perhaps change the code slightly, to make my
point more clear:

t= {1, 2, 3, x=10000, y= 11111} -- just to have some nice table start
for i= 1, 1000 do
local n= math.random(1000)
t[n]= n
end

print( #t)

for i= 1, 10 do
print( i, t[i])
end

This produced the following output for #t and the table t in my case:
440
1       1
2       2
3       3
4       nil
5       5
6       6
7       7
8       8
9       nil
10      10

So this is a "really nasty table" now... it has nils distributed nicely in
its key-value range 1...1000, and #t just gives some sort of
"estimated length".

To confirm that 440 is at least "quite reasonable", you can
print the table range 335...445, then I got:
435     nil
436     nil
437     nil
438     438
439     439
440     440
441     nil
442     nil
443     443
444     nil
445     nil

... so this confirms, that by some "reaonably fast checking of #t"
(= avoiding time-eating linear search) 440 might be some
"reaonable value for #t".

... ok ...

but anyway:
here ALL the numbered keys appear nicely ordered if
I iterate with for using the pairs iterator:

I would be happy already concerning the first 440 ones,
but in fact you can test the complete table like this:

k_last= 0
for k, v in pairs(t) do
if type(k) == 'number' then
if k <= k_last then
print ("sorting error in 1..#t: " .. k)
end
k_last= k
end
end
print( k_last)

... and the result will be:
1000

... so this ran nicely up to 1000 without
any error message, all nicely sorted for the
number indices...

(but I am happy to restrict our discussion for the
range 1...#t
I am happy if we should agree,
that in this range FOR SURE the keys will arrive
in sorted order... and that these keys 1...#t
will appear, BEFORE any other keys / any hash
keys will appear...

this is really helpful to know definitely... this is NOT
an academic question without real practical output,
I hope you also agree in this.)

--
Sent from: http://lua.2524044.n2.nabble.com/Lua-l-f2524044.html

• Follow-Ups: