lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Am 10.01.2014 15:17 schröbte Jorge:
Hi,

Hi!


I need a pure-Lua implementation of a Bloom filter
(http://en.wikipedia.org/wiki/Bloom_filter), and the only reference i've
found is this, from 2005:

http://lua.2524044.n2.nabble.com/bloom-filter-td7611346.html

A nice piece of code, actually. The bitset is a table mapping positive integer keys to boolean values, but instead of just using

    { [0] = true, [23] = true, [52] = true }

it compresses 52 consecutive integer keys into a single integer key and uses a Lua number to represent a 52-bit wide bitset as the value. So

    {  -- if Lua had binary literals
      [0] = 0b100000000000000000000001,  -- bit 0 -> 0, bit 23 -> 23
      [1] = 0b1                          -- bit 0 -> 52
    }

So to implement your bitwise "or" you iterate over both tables via `pairs`. If only one table has a value at the given index but not the other you just copy the key-value pair to the result table. If both table have a value at a given index you take the bitwise "or" of both values and store it in the result table. Now you just need to figure out how to do a bitwise "or" of two 52-bit wide numbers in Lua (you can probably reuse some code from the `test` function) ...


I've implemented a Bloom filter based on the bitset referenced in the
thread (by Wim Couwenberg) and it works, but i'm stuck at how to do a
merge (union) of two filters. In abstract, the union is just the "or" of
two filters, but that bitset implementation has me stumped. I just don't
understand how it works!

Has anyone worked with Bloom filters in Lua?

How do you hash arbitrary Lua values (e.g. tables) from within Lua? Or do you limit yourself to certain keys? I thought about implementing Bloom filters for fun not long ago, but I couldn't figure out how to hash a table value (or userdata, etc.) from within Lua ...


Thanks in advance.

Jorge


HTH,
Philipp