[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: hash collisions
- From: "Brett Bibby" <research@...>
- Date: Tue, 4 Nov 2003 07:25:56 +0800
Thanks for your post, it was very helpful.
>However, if you want truly fast access to values in a Lua table,
>you are probably better off using consecutive integers starting at 1.
How do I do this?  Can you give me a simple example?
Thanks again for your help, sometimes lua's lack of comments and stunted
variable naming leave me a little confused :-)
Brett
----- Original Message ----- 
From: <RLake@oxfam.org.pe>
To: "Lua list" <lua@bazar2.conectiva.com.br>
Sent: Tuesday, November 04, 2003 6:55 AM
Subject: Re: hash collisions
> I don't think anyone responded to this:
>
> > I was looking at the lua source in order to understand how it
handles
> > hashing, specifically collisions.  At the top of ltable.c there is a
> comment
> > explaining that the loading does not adversely effect performance.
After
> > digging around the code for a while, I'm still not sure I understand
how
> lua
> > implements hash tables and handles collisions :-)  Can anybody help
me
> to
> > understand the implementation?
>
> The best way to check is with benchmarking, of course, but I can give
you
> a couple of hints.
>
> Basically, the hash value for a string is a function of the size of
the
> string and at most 32 of the bytes from the string. Other than the
> size limitation, it is a pretty standard hash function and should do
> just fine with strings created sequentially. However, if the strings
> are 32 bytes long or more, you might run into problems. In particular,
> a 32-byte string will be hashed only on every second character. The
last
> character, however, is always used, which ameliorates the problem for
> sequentially selected strings.
>
> The comments in ltable.c are pretty good, actually; it is an
interesting
> hash table implementation and works very well for small tables.
>
> Anyway, the idea is that the hash chain starts at the position
> indicated by the hash of the key; the rest of the nodes (if any)
> with the same hash occupy positions in the table which don't (yet)
> correspond to any key. When a new (key, value) pair is inserted,
> it may be necessary to "evict" the node which is "borrowing"
> the hash-position.
>
> The whole scheme is quite fast and also quite clever; of course,
> like all hash tables, it depends on the distribution of hashes, but
> I have never seen any problems with string hashes.
>
> In short, you would have to work at it to do a better hash.
>
> However, if you want truly fast access to values in a Lua table,
> you are probably better off using consecutive integers starting at 1.
> Since Lua 5, Lua tables have an array part as well as a hash part;
> the array part is specifically used to store small integer keys. If
> a key falls into the range of the array part, the access is direct.
> The big advantage here is that it is not necessary to store keys
> (or chain pointers) in the array part of the table, so there is
> less bookkeeping and less storage used up. Also, these integer
> keys do not collide with keys of any other data type. (Numeric
> keys which do not fall within the capacity of the array part
> are handled normally, so the whole mechanism is more or less
> transparent; when the table is rehashed, the array part is
> resized if necessary to guarantee at least 50% occupancy.)
>
> R.
>