lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


However the nasty unbalanced hash trees only occur if you have not used a good enough hashing function; the implementation, above some table size, could arrange to use a stronger cryptographic function (e.g. SHA512, but still SHA1 is still very resistant) which will ensure a really flattened distribution, instead of a simple modular arithmetic function with the multiplication by known static prime (like the simple Knut's hashing function).

Cryptographic hashes are now very fast, and very resistant to DDOS attacks (it is really extremely difficult to "unflatten" the distribution over any hash table sise of reasonnable size). Even with SHA1 or MD5, it would be very computing intensive to create overlong collision lists desequilibrating the distribution of hash keys. The hashing function does not need to create unique keys for unique values, and the cost of of collision lists is proportional to their maximum length (which is inversely proporitional to the size of the hash table, which is turn should grow in size according to the number of keys/size in the table to index).

Let's suppose that the first level of hash table is sqrt(N) where N is the maximum number of elements in the table, then the average length of collision lists (with a flattened distribution) is sqrt(N)/2, and a good hashing function would make the worst collision list length no logner than sqrt(N). So you can easily use a second layer of hashing and then reduce the worst collision list length to O(sqrt(sqrt(N))). The storage cost for the two hashing tables (using the two hasing functions) will still be 2*O(sqrt(N)), and it is still modest compared to the cost of a B-tree (with moderate fill factor) because of the free space left in its pages.

All these designs are just based on statistics metrics that the implementation can maintain. It has many choices ! Lua however does not mandate any hashing function for its tables, so the ordering of keys wahen traversing tables with pairs() is already not predictable allowing freedom of choice. If you needed a table with sorted keys, you can already do that in Lua using a separate table of keys, ordered in a sequence (from 1 to N) independantly of the implementation of tables.

Because Lua is already optimizing the storage of table keys that form a sequence (they are not stored in the hash, but their in values are directly stored in a separate vector indexed by integer, and theses integer keys are then very fast and don't cost any additional storage, that's why ipairs() is very fast).