lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On Jul 6, 2015, at 8:02 AM, Cosmin Apreutesei <cosmin.apreutesei@gmail.com> wrote:

This IS actually possible to implement in Lua, and it's not TOO
horribly inefficient (O(n log m) overhead per construction, no
additional overhead for reading and indexing):

It gets more complicated and slow[1] if you want the tuple table to be self-cleaning though (add in gc/weak-table overhead). I would imagine that a built-in tuple type would be much more efficient since it would just index on the sum of the hashes of the elements and only check the elements on collisions, or am I missing something?



But what happens if one of the values in the tuple is a table? (yes, I know, “not allowed” is the easy way out, but what about userdata etc? It would seem that the types of values allowed in a tuple is quite limited (though presumably a nested tuple is allowed.)

The issue of “by reference” (interned like strings) or “by value” (discrete copies regardless of similarity of contents) is more subtle when looking at performance. Interning means tuple comparisons are very fast (basically a pointer compare), at the expense of initial creation and GC time. So which is faster depends very much upon the usage profile of a given application.

However, if you follow through on which values ARE allowed in a tuple, then what you are left with (imho) is a small set of value types (number, string, boolean, nested tuple) that are allowed in a tuple. Assume we now designate a “tuple table” to be a Lua sequence containing only items of these values (purely a convention). At this point it’s quite trivial to write a small C library that, given such a “tuple table”, would return a unique userdata item (garbage collected) that was (a) the same item for any identical tuple table and (b) a different item for any non-identical tuple table (where “identical” means the same value at the same sequence index in the tuple table, applied recursively to nested tuple tables).

This would give you all the functionality you need for memoization type tasks.

The API looks something like this:

tuple = totuple{ 10, “a”, 19.5, { “nested”, 100 } }

One possible (and quite efficient when coded in C) implementation would be to create a Lua string “signature" from the packed binary concatenation of type/value pairs from the tuple table (in binary, no need to convert to string form). (You would also need to add markers for nested tuples, but that’s trivial.) Then, maintain a hidden Lua “master tuple” table in the Registry, with weak values. Use the “signature" as a key, and a dummy (one byte long) full userdata as the value. All the function has to do is create the “signature" key as above, then look it up in the “master tuple” table. if its found, return the value (the userdata). If not, create a new dummy userdata, add it to the table keyed by the Lua string key, and return the new userdata.

You now have a  unique Lua value for any valid tuple, that will be the same for any given “tuple table” fed to the above API. This value can then be used as the key for whatever memoization system you want, or to compare to other tuples as desired. Performance should be pretty good, since the overhead is mostly just concatenating binary bytes to create the signature (which can be very fast), followed by a single table lookup. Each “signature” is only ever stored once, providing intern-like advantages, and the userdata (and signature) are garbage collected when no longer used (thanks to the weak values in the master tuple table).

—Tim