lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Romulo wrote:

How a hash calculation would be different and more  efficient than a
string representation? You could store the hash in the table, for
later reuse..

Uh?!?...I reread my own message. I was sure I didn't put any emphasis on performance, but I must admit I probably misused the word "overhead". In fact in the rest of my message I'm quite sure the tone is of a whole different kind. Sorry for the misunderstanding.

What I really meant, was that conceiving and then using a suitable string representation for a generic "class" with value semantics can be cumbersome and a bit inconvenient (In my head "overhead", in that circumstance, meant sort of "code clutter" - sorry - probably an interference from my mother language).

For a better explanation see below the reply to Drake.

Drake Wilson wrote:
Keep in mind (in case you weren't already aware) that Lua interns all
strings, so equality and table-key hashing for Lua string values are
O(1).  If you generate a string that already exists, you're actually
just referencing the existing string.

Oh, yes. I did know that.


To do a table lookup for a value with some kind of __hash/__eq usage
and some structure inside would (as a rough estimate of a usual case)
have to iterate the internal structure once for the __hash, then once
for each __eq (unless the objects are themselves already interned, in
which case your original problem no longer appears and you can just
use the objects themselves as keys); generating a string key iterates
the structure once and then has to hash the resulting bytes once in
order to intern it, after which it's been unified with the interned
copy.  (It's possible for intermediary combining operations to take
extra time, but I think this is true for the __hash case as well.)  If
there's no hash collisions I'd expect these to come out about the
same; if there are collisions I'd expect the string keys to be a bit
faster because the comparison cost has been minimized.

I suppose collisions in the string table itself might change that.
That and the hash function Lua 5.1 uses is technically O(1) since it
picks a sparse O(1) subset of the string to hash once the string gets
long enough.  It's all back-of-the-envelope calculations with no
actual measurement anyway, though.

As I said above, I didn't mean to argue on performance, but I find your explanation very interesting. I didn't know all these details. Thanks!


If you do experiment with __hash metamethods I'd be interested to see
performance comparison numbers.

Unfortunately (for me) I'm not skilled enough to tamper with Lua internals so much as to patch it and introduce a new metamethod.



   ---> Drake Wilson

To make myself more clear: what I meant is that when one has (or wants) to make liberal use of "value classes" face some inconvenience in coding.

Take the following example (very rough coding, but I concocted it in a very short while):


--------
local Vec_MT = {}
Vec_MT.__index = Vec_MT

-- very simple 2D vector object constructor
local function Vec(x, y)
   local vec = { x = x, y = y, repr = tostring(x) .. "|" .. tostring(y) }
   return setmetatable( vec, Vec_MT )
end

function Vec_MT:__eq(v) return self.repr == v.repr end
function Vec_MT:__add(v) return Vec(v.x + self.x, v.y + self.y) end

-- let's exercise our "class"
local v1, v2, v3 = Vec(1,2), Vec(1,2), Vec(2,4)
assert( not rawequal(v1, v2) )
assert( v1 == v2 )
assert( v1 ~= v3 )

-- now we want to put Vec objects in a table as keys
local v_set = { [v1] = true, [v2] = true, [v3] = true }
assert(not v_set[Vec(1,2)])
assert(not v_set[v1 + v2])
-- but we'd like the following to work, instead:
-- assert(v_set[Vec(1,2)])
-- assert(v_set[v1 + v2])

-- so we need a specially crafted "container" which
-- knows about "Vec" objects
local function VTable(t)
   local t_data = {}	-- actual table data stored in an upvalue
   local mt = {}
   function mt:__newindex(vec_key, value) t_data[vec_key.repr] = value end
   function mt:__index(vec_key) return t_data[vec_key.repr] end
   local vt = setmetatable( {}, mt )
   for k, v in pairs(t) do vt[k] = v end  -- initialization
   return vt
end

-- now it works:
local v_set = VTable{ [v1] = true, [v2] = true, [v3] = true }
assert(v_set[Vec(1,2)])
assert(v_set[v1 + v2])

--------


All this is a bit cumbersome, because for every new "value class" you need a new specialized "container". Therefore I'd find useful a metamethod which, if present on the item being used as key, could be used to retrieve the actual key used for hashing.

For example:

function Vec_MT:__hash() return self.repr end

But I realize that that maybe this check on every access for every key could be a big performance hit.

That's why I asked: I expected that someone could say I proposed something silly, but being criticized constructively is a good way to learn faster :-).

I'm discovering day by day that I have to unlearn so much! :-)

--

Lorenzo