lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Alex Davies <alex.mania <at> iinet.net.au> writes:
> Hey Joshua,
> 
> > There is one problem with my proposed scheme that bothers me a bit.  The 
> > question is: what would a table return as its default __hash and __eql 
> > methods?  On one hand I'd want it to treat the table by value, calling 
> > __hash and __eql on all of its individual members, so that tables with the 
> > same contents hash the same.
> 
> I just want to say definitely not :).  I don't think even a second should be 
> spent considering that - this is Lua, not Python.  We're not afraid to call 
> two different tables just that - different.  (Don't get me wrong, I think 
> Python's great - but I do think that this as default behaviour of its kind 
> of weird/pointless, and I cannot see how it could be implemented remotely 
> efficiently.)  Similarly simply adding tables as keys (oftenly done) would 
> be very slow -potentially many comparisons of equality have to be done on 
> each key lookup - tables are all we have, lets keep them fast.

I see your point.  Luckily, thinking about this more I realized I could get
value-based table hashing by doing it myself in Lua.  If __hash and __eql were
supported, I could do this:

-- The hash function exposed by the Lua C runtime.  Does something like:
function hash(value)
  local hash_metamethod = getmetatable(value).__hash
  if hash_metamethod then
    return hash_metamethod(value)
  else
    return lua_internal_hash(value)
  end
end

-- The hash combiner function exposed by the Lua C runtime.  Does
-- something like:
function combinehash(existing_hash, hash_to_add)
  -- xor doesn't exist in Lua but it does in C!
  return existing_hash ^ hash_to_add
end

-- A hash function for tables that hashes all tables and subtables that
-- have no __hash metamethod by their *value*.
function table_value_hash(table)
  local hash_val = 0
  for k,v in pairs(table) do
    hash_val = combinehash(hash, hash(k))
    if type(v) == "table" and not getmetatable(v).__hash then
      -- Hash sub-tables with no __hash metamethod by value also.
      hash_val = combinehash(hash_val, table_value_hash(v))
    else
      hash_val = combinehash(hash_val, hash(v))
    end
  end
  return hash_val
end

t = {}
t[{}] = "foo"  -- works like it always has

value_table1 = {foo={bar="baz"}}
value_table2 = {foo={bar="baz"}}

value_mt = {__hash=table_value_hash, __eql=table_value_eql}
setmetatable(value_table1, value_mt)
setmetatable(value_table2, value_mt)

-- the two value tables now hash to the same thing, despite the fact that
-- they are completely distinct objects and have nested data.
t[value_table1] = true
t[value_table2] = true  -- not inserted, already present

> Anyway, a patch for a simple __hash would not be hard to make.  Only a 
> couple of places would have to be modified.  Again though, you have to 
> remember that tables are integral to Lua's performance - anything that 
> bloats or slows them really has to be considered very carefully.  And with 
> that in mind, Lua has no extra space in the nodes in its tables to store the 
> hash of each key. So whenever the table gets resized, table keys would have 
> to be checked for a __hash metamethod and recalled (how does Python handle 
> it?).

Yep -- seems like a simple time/space tradeoff to be made here.  Either give
table nodes extra space to save their hash value, or take the time hit of
recomputing it.

> Along with being slow, this could also cause problems in the future 
> in the unlikely case that the gc starts resizing tables.  (Actually I lie - 
> if you want to 8 byte align the Node data structure you could fit a hash 
> value in there).

Excellent.  :)

> Personally I do think a __hash metamethod would be a large improvement where 
> Lua is used largely as a standalone language.  As an extension/scripting 
> language not quite so important.

Understood -- I am using it as a standalone language.  I might as well point you
all to it now.  I'm using Lua to write the compiler (and soon the extension
language too) for a new parsing framework called Gazelle:

http://www.gazelle-parser.org/

I've been working on it for almost two years now, and while it still has a long
way to go, I'm making great strides.  One of my next goals is for it to be able
to parse Lua -- see my prototype Lua grammar for Gazelle here (it doesn't
actually work yet, because it depends on features of Gazelle that aren't
finished yet):

http://github.com/haberman/gazelle/blob/d2393dbe1ea61c2281bf273761e083970f8a84c1/sketches/lua.gzl

In any case, the compiler involves a lot of set manipulation, so being able to
define identity with __hash and __eql would be really nice.

The reason I've chosen to use Lua standalone here is that I LOVE how lightweight
and clean its runtime is.  No other language runtime comes close. It's the only
high-level language that I'd feel comfortable asking my users to link into their
binaries.  And I also like how fast it is.

> Anyway, good luck keep us informed. :)

Thanks!  :)

Josh