lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


> The new "size" will be floor(log2(nuse*5/4))+1 (or something like that),
> so if nuse == (oldSize - a-few-slots) the hash will grow (see function
> "rehash").

Sorry, but I don't think that's right.  If you look at every occurrence of
firstfree in ltable.c you'll see that whenever there is a collision
firstfree is decremented and when it hits the bottom the table is rehashed.
At that point the table may have ANY number of entries since entries could
have been removed along the way.  Imagine filling a table of size n to 50%
and then adding and removing 25% more entries over and over with collisions.
After at most n collisions though the table is not full or empty, the table
will rehash and the last else clause below will be taken:
  if (nelems >= oldsize-oldsize/4)  /* using more than 3/4? */
    setnodevector(L, t, (lint32)oldsize*2);
  else if (nelems <= oldsize/4 && oldsize > MINPOWER2)
    setnodevector(L, t, oldsize/2);
  else
    setnodevector(L, t, oldsize);

That of course does a lot of work just to reset firstfree to the highest
empty slot.  Personally, I wouldn't even do a linear scan to reset firstfree
but would keep it updated to the highest firstfree node as you go, but that
does entail a extra test at each assignment as you mention below in
reference to the numused scan for which it is not needed.

It would be great to be able to say that all table operations are nearly
linear in non-pathological tables except when the table has to change size.
Right now that's not true of Lua's tables since they will rehash just to
reset firstfree.  At least make it a fast linear scan and not a reallocation
and reinsert of ever element.  That means that what one might think is a
nice non-growing table (like a cache of n elements) will occasionally give
set operations that take a LOT longer than others set operations.

>> 2. keep a numUsed field in the table and get rid of the numuse subroutine
>> --- downside: every table is 4 bytes larger
> 
> Another downside: you need an extra test everytime you assign a value to
> the table, to check whether it is nil.

I don't get it; this is a different linear overhead in the code from
firstfree.  You increment when you add something and decrement when you
remove something and don't do a linear scan to count elements.  Though if
you're really about to rehash (not needed in the second else clause above)
this fast linear scan is nothing compared to the slow linear reinsertion of
all the elements into the new nodevector.  This one isn't that important in
that case.

Russ