lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I have also pondered some alternative implementations of calculating table length.
This is the best I came up with, other than what's already implemented in Lua:

https://github.com/krka/kahlua2/blob/master/core/src/se/krka/kahlua/vm/KahluaArray.java
(Apologies to any Java haters, and for the bad indentation)

Never mind the zero-indexing and all the usual java boiler plate, here are my basic rules for length:

* Maintain an integer variable LEN with the current best guess length.
* Maintain a boolean flag DIRTY to determine if the current length is possibly too high.

* for every call to rawset with a non-nil value with an index larger than the LEN, set LEN = index
* for every call to rawset with a nil value with a index equal to LEN, set DIRTY = true
* for every call to length:
   if DIRTY = false: return LEN
   if DIRTY = true: loop down from LEN until you hit a non-nil value in the table, update LEN and set DIRTY = false

This strategy actually works pretty nicely for most use cases I've encountered, the big problems is if you do stuff like this:

while true do
  t[10000000] = 1
  t[10000000] = nil
  local x = #t
end
  
That would trigger a lot of scans, but the typical usecases for using the length operator all seem to work with mostly filled arrays where the length doesn't vary much, or atleast not with such extreme values.


Of course, this all gets more complicated when you start considering weak tables (atleast in Java, where there is no usable callback)...

On Fri, Dec 3, 2010 at 2:44 PM, Javier Guerra Giraldez <javier@guerrag.com> wrote:
On Fri, Dec 3, 2010 at 7:16 AM, Hisham <hisham.hm@gmail.com> wrote:
> [*] Of course, this is a minor problem compared to explaining the
> weird behavior of the # operator. I predict it will be changed to
> return the actual number of elements of a table sometime around Lua
> 7.0 (when the argument that not maintaining a counter saves precious
> memory and processing won't be as compelling)

this has been discussed at length, but i don't think most people
realize that it's not that easy.

imagine that every table keeps a 'proper' length field, and #t returns it:

t={1,2,3}     => #t = 3
t[2] = nil      => #t = 3
t[3] = nil      => #t =.... ?  should be 1, right?

hum... the core would have to scan backwards to skip 2 and set it to 1

now:
t = {1,2}             => #t = 2
t[1000000] = 3   => #t = 1000000
t[500] = 4           => #t = 1000000
t[1000000] = nil  => #t = ....?  should be 500, right?

now it doesn't seem so cheap to just keep a counter and update _every_
time you set a member to nil.

think about it,  array-like handling suddenly gets O(n) for some very
common operations, instead of the current O(1) for most with a single
O(log n) for a single, relatively low usage operation.



--
Javier