[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: hash metamethod?
- From: Joshua Haberman <joshua@...>
- Date: Sun, 1 Feb 2009 19:42:39 +0000 (UTC)
Alex Davies <alex.mania <at> iinet.net.au> writes:
> Hey Joshua,
>
> > There is one problem with my proposed scheme that bothers me a bit. The
> > question is: what would a table return as its default __hash and __eql
> > methods? On one hand I'd want it to treat the table by value, calling
> > __hash and __eql on all of its individual members, so that tables with the
> > same contents hash the same.
>
> I just want to say definitely not :). I don't think even a second should be
> spent considering that - this is Lua, not Python. We're not afraid to call
> two different tables just that - different. (Don't get me wrong, I think
> Python's great - but I do think that this as default behaviour of its kind
> of weird/pointless, and I cannot see how it could be implemented remotely
> efficiently.) Similarly simply adding tables as keys (oftenly done) would
> be very slow -potentially many comparisons of equality have to be done on
> each key lookup - tables are all we have, lets keep them fast.
I see your point. Luckily, thinking about this more I realized I could get
value-based table hashing by doing it myself in Lua. If __hash and __eql were
supported, I could do this:
-- The hash function exposed by the Lua C runtime. Does something like:
function hash(value)
local hash_metamethod = getmetatable(value).__hash
if hash_metamethod then
return hash_metamethod(value)
else
return lua_internal_hash(value)
end
end
-- The hash combiner function exposed by the Lua C runtime. Does
-- something like:
function combinehash(existing_hash, hash_to_add)
-- xor doesn't exist in Lua but it does in C!
return existing_hash ^ hash_to_add
end
-- A hash function for tables that hashes all tables and subtables that
-- have no __hash metamethod by their *value*.
function table_value_hash(table)
local hash_val = 0
for k,v in pairs(table) do
hash_val = combinehash(hash, hash(k))
if type(v) == "table" and not getmetatable(v).__hash then
-- Hash sub-tables with no __hash metamethod by value also.
hash_val = combinehash(hash_val, table_value_hash(v))
else
hash_val = combinehash(hash_val, hash(v))
end
end
return hash_val
end
t = {}
t[{}] = "foo" -- works like it always has
value_table1 = {foo={bar="baz"}}
value_table2 = {foo={bar="baz"}}
value_mt = {__hash=table_value_hash, __eql=table_value_eql}
setmetatable(value_table1, value_mt)
setmetatable(value_table2, value_mt)
-- the two value tables now hash to the same thing, despite the fact that
-- they are completely distinct objects and have nested data.
t[value_table1] = true
t[value_table2] = true -- not inserted, already present
> Anyway, a patch for a simple __hash would not be hard to make. Only a
> couple of places would have to be modified. Again though, you have to
> remember that tables are integral to Lua's performance - anything that
> bloats or slows them really has to be considered very carefully. And with
> that in mind, Lua has no extra space in the nodes in its tables to store the
> hash of each key. So whenever the table gets resized, table keys would have
> to be checked for a __hash metamethod and recalled (how does Python handle
> it?).
Yep -- seems like a simple time/space tradeoff to be made here. Either give
table nodes extra space to save their hash value, or take the time hit of
recomputing it.
> Along with being slow, this could also cause problems in the future
> in the unlikely case that the gc starts resizing tables. (Actually I lie -
> if you want to 8 byte align the Node data structure you could fit a hash
> value in there).
Excellent. :)
> Personally I do think a __hash metamethod would be a large improvement where
> Lua is used largely as a standalone language. As an extension/scripting
> language not quite so important.
Understood -- I am using it as a standalone language. I might as well point you
all to it now. I'm using Lua to write the compiler (and soon the extension
language too) for a new parsing framework called Gazelle:
http://www.gazelle-parser.org/
I've been working on it for almost two years now, and while it still has a long
way to go, I'm making great strides. One of my next goals is for it to be able
to parse Lua -- see my prototype Lua grammar for Gazelle here (it doesn't
actually work yet, because it depends on features of Gazelle that aren't
finished yet):
http://github.com/haberman/gazelle/blob/d2393dbe1ea61c2281bf273761e083970f8a84c1/sketches/lua.gzl
In any case, the compiler involves a lot of set manipulation, so being able to
define identity with __hash and __eql would be really nice.
The reason I've chosen to use Lua standalone here is that I LOVE how lightweight
and clean its runtime is. No other language runtime comes close. It's the only
high-level language that I'd feel comfortable asking my users to link into their
binaries. And I also like how fast it is.
> Anyway, good luck keep us informed. :)
Thanks! :)
Josh