lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Javier,
 
excessive collisions, since you don't know the exact subset of 
characters sampled

This is the code I see here https://github.com/lua/lua/blob/master/lstring.c

unsigned int luaS_hash (const char *str, size_t l, unsigned int seed,
                        size_t step) {
  unsigned int h = seed ^ cast_uint(l);
  for (; l >= step; l -= step)
    h ^= ((h<<5) + (h>>2) + cast_byte(str[l - 1]));
  return h;
}


From this code it is clear that the subset of characters sampled does not depend on the seed but only from the length of the string and the step.

Therefore if you build strings that only differs in characters that are not sampled, you are going to get same number of collisions.

Only the buckets were the collisions happen will change, depending on the seed.

    Andrea

--
Andrea Vitali