Re: value semantics and __hash metamethod (?)

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: value semantics and __hash metamethod (?)
From: Lorenzo Donati <lorenzodonatibz@...>
Date: Sat, 20 Nov 2010 19:35:12 +0100

Romulo wrote:

How a hash calculation would be different and more  efficient than a
string representation? You could store the hash in the table, for
later reuse..

Uh?!?...I reread my own message. I was sure I didn't put any emphasis onperformance, but I must admit I probably misused the word "overhead". Infact in the rest of my message I'm quite sure the tone is of a wholedifferent kind. Sorry for the misunderstanding.

What I really meant, was that conceiving and then using a suitablestring representation for a generic "class" with value semantics can becumbersome and a bit inconvenient (In my head "overhead", in thatcircumstance, meant sort of "code clutter" - sorry - probably aninterference from my mother language).


For a better explanation see below the reply to Drake.

Drake Wilson wrote:

Keep in mind (in case you weren't already aware) that Lua interns all
strings, so equality and table-key hashing for Lua string values are
O(1).  If you generate a string that already exists, you're actually
just referencing the existing string.


Oh, yes. I did know that.


To do a table lookup for a value with some kind of __hash/__eq usage
and some structure inside would (as a rough estimate of a usual case)
have to iterate the internal structure once for the __hash, then once
for each __eq (unless the objects are themselves already interned, in
which case your original problem no longer appears and you can just
use the objects themselves as keys); generating a string key iterates
the structure once and then has to hash the resulting bytes once in
order to intern it, after which it's been unified with the interned
copy.  (It's possible for intermediary combining operations to take
extra time, but I think this is true for the __hash case as well.)  If
there's no hash collisions I'd expect these to come out about the
same; if there are collisions I'd expect the string keys to be a bit
faster because the comparison cost has been minimized.

I suppose collisions in the string table itself might change that.
That and the hash function Lua 5.1 uses is technically O(1) since it
picks a sparse O(1) subset of the string to hash once the string gets
long enough.  It's all back-of-the-envelope calculations with no
actual measurement anyway, though.

As I said above, I didn't mean to argue on performance, but I find yourexplanation very interesting. I didn't know all these details. Thanks!


If you do experiment with __hash metamethods I'd be interested to see
performance comparison numbers.

Unfortunately (for me) I'm not skilled enough to tamper with Luainternals so much as to patch it and introduce a new metamethod.


   ---> Drake Wilson

To make myself more clear: what I meant is that when one has (or wants)to make liberal use of "value classes" face some inconvenience in coding.

Take the following example (very rough coding, but I concocted it in avery short while):



--------
local Vec_MT = {}
Vec_MT.__index = Vec_MT

-- very simple 2D vector object constructor
local function Vec(x, y)
   local vec = { x = x, y = y, repr = tostring(x) .. "|" .. tostring(y) }
   return setmetatable( vec, Vec_MT )
end

function Vec_MT:__eq(v) return self.repr == v.repr end
function Vec_MT:__add(v) return Vec(v.x + self.x, v.y + self.y) end

-- let's exercise our "class"
local v1, v2, v3 = Vec(1,2), Vec(1,2), Vec(2,4)
assert( not rawequal(v1, v2) )
assert( v1 == v2 )
assert( v1 ~= v3 )

-- now we want to put Vec objects in a table as keys
local v_set = { [v1] = true, [v2] = true, [v3] = true }
assert(not v_set[Vec(1,2)])
assert(not v_set[v1 + v2])
-- but we'd like the following to work, instead:
-- assert(v_set[Vec(1,2)])
-- assert(v_set[v1 + v2])

-- so we need a specially crafted "container" which
-- knows about "Vec" objects
local function VTable(t)
   local t_data = {}	-- actual table data stored in an upvalue
   local mt = {}
   function mt:__newindex(vec_key, value) t_data[vec_key.repr] = value end
   function mt:__index(vec_key) return t_data[vec_key.repr] end
   local vt = setmetatable( {}, mt )
   for k, v in pairs(t) do vt[k] = v end  -- initialization
   return vt
end

-- now it works:
local v_set = VTable{ [v1] = true, [v2] = true, [v3] = true }
assert(v_set[Vec(1,2)])
assert(v_set[v1 + v2])

--------

All this is a bit cumbersome, because for every new "value class" youneed a new specialized "container".Therefore I'd find useful a metamethod which, if present on the itembeing used as key, could be used to retrieve the actual key used forhashing.


For example:

function Vec_MT:__hash() return self.repr end

But I realize that that maybe this check on every access for every keycould be a big performance hit.

That's why I asked: I expected that someone could say I proposedsomething silly, but being criticized constructively is a good way tolearn faster :-).


I'm discovering day by day that I have to unlearn so much! :-)

--

Lorenzo

Follow-Ups:
- Re: value semantics and __hash metamethod (?), Renato Maia

References:
- value semantics and __hash metamethod (?), Lorenzo Donati
- Re: value semantics and __hash metamethod (?), Drake Wilson

Prev by Date: Re: Quaternary Operator
Next by Date: Re: Quaternary Operator
Previous by thread: Re: value semantics and __hash metamethod (?)
Next by thread: Re: value semantics and __hash metamethod (?)
Index(es):
- Date
- Thread