[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: hash collisions
- From: "Brett Bibby" <research@...>
- Date: Tue, 4 Nov 2003 07:25:56 +0800
Thanks for your post, it was very helpful.
>However, if you want truly fast access to values in a Lua table,
>you are probably better off using consecutive integers starting at 1.
How do I do this? Can you give me a simple example?
Thanks again for your help, sometimes lua's lack of comments and stunted
variable naming leave me a little confused :-)
----- Original Message -----
To: "Lua list" <email@example.com>
Sent: Tuesday, November 04, 2003 6:55 AM
Subject: Re: hash collisions
> I don't think anyone responded to this:
> > I was looking at the lua source in order to understand how it
> > hashing, specifically collisions. At the top of ltable.c there is a
> > explaining that the loading does not adversely effect performance.
> > digging around the code for a while, I'm still not sure I understand
> > implements hash tables and handles collisions :-) Can anybody help
> > understand the implementation?
> The best way to check is with benchmarking, of course, but I can give
> a couple of hints.
> Basically, the hash value for a string is a function of the size of
> string and at most 32 of the bytes from the string. Other than the
> size limitation, it is a pretty standard hash function and should do
> just fine with strings created sequentially. However, if the strings
> are 32 bytes long or more, you might run into problems. In particular,
> a 32-byte string will be hashed only on every second character. The
> character, however, is always used, which ameliorates the problem for
> sequentially selected strings.
> The comments in ltable.c are pretty good, actually; it is an
> hash table implementation and works very well for small tables.
> Anyway, the idea is that the hash chain starts at the position
> indicated by the hash of the key; the rest of the nodes (if any)
> with the same hash occupy positions in the table which don't (yet)
> correspond to any key. When a new (key, value) pair is inserted,
> it may be necessary to "evict" the node which is "borrowing"
> the hash-position.
> The whole scheme is quite fast and also quite clever; of course,
> like all hash tables, it depends on the distribution of hashes, but
> I have never seen any problems with string hashes.
> In short, you would have to work at it to do a better hash.
> However, if you want truly fast access to values in a Lua table,
> you are probably better off using consecutive integers starting at 1.
> Since Lua 5, Lua tables have an array part as well as a hash part;
> the array part is specifically used to store small integer keys. If
> a key falls into the range of the array part, the access is direct.
> The big advantage here is that it is not necessary to store keys
> (or chain pointers) in the array part of the table, so there is
> less bookkeeping and less storage used up. Also, these integer
> keys do not collide with keys of any other data type. (Numeric
> keys which do not fall within the capacity of the array part
> are handled normally, so the whole mechanism is more or less
> transparent; when the table is rehashed, the array part is
> resized if necessary to guarantee at least 50% occupancy.)