[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Bloom filters.
- From: Philipp Janda <siffiejoe@...>
- Date: Fri, 10 Jan 2014 17:05:03 +0100
Am 10.01.2014 15:17 schröbte Jorge:
Hi,
Hi!
I need a pure-Lua implementation of a Bloom filter
(http://en.wikipedia.org/wiki/Bloom_filter), and the only reference i've
found is this, from 2005:
http://lua.2524044.n2.nabble.com/bloom-filter-td7611346.html
A nice piece of code, actually. The bitset is a table mapping positive
integer keys to boolean values, but instead of just using
{ [0] = true, [23] = true, [52] = true }
it compresses 52 consecutive integer keys into a single integer key and
uses a Lua number to represent a 52-bit wide bitset as the value. So
{ -- if Lua had binary literals
[0] = 0b100000000000000000000001, -- bit 0 -> 0, bit 23 -> 23
[1] = 0b1 -- bit 0 -> 52
}
So to implement your bitwise "or" you iterate over both tables via
`pairs`. If only one table has a value at the given index but not the
other you just copy the key-value pair to the result table. If both
table have a value at a given index you take the bitwise "or" of both
values and store it in the result table.
Now you just need to figure out how to do a bitwise "or" of two 52-bit
wide numbers in Lua (you can probably reuse some code from the `test`
function) ...
I've implemented a Bloom filter based on the bitset referenced in the
thread (by Wim Couwenberg) and it works, but i'm stuck at how to do a
merge (union) of two filters. In abstract, the union is just the "or" of
two filters, but that bitset implementation has me stumped. I just don't
understand how it works!
Has anyone worked with Bloom filters in Lua?
How do you hash arbitrary Lua values (e.g. tables) from within Lua? Or
do you limit yourself to certain keys? I thought about implementing
Bloom filters for fun not long ago, but I couldn't figure out how to
hash a table value (or userdata, etc.) from within Lua ...
Thanks in advance.
Jorge
HTH,
Philipp