Re: Cons, tuple and __slots__ -- a simple mechanism to implement them in Lua

On Sep 17, 2011, at 6:45 AM, Fabien wrote:

If we were to play and introduce a new structure (as a patch of course ;-)), I'd want it to offer more than micro-optimisations by saving a couple of table creations and unpackings.

A feature of tables, which makes some algorithm implementations cumbersome, is that they're mutable. A consequence is that they don't support cheap structural equality.
Supposing that we denote tuples with angles <...>, I'd like to have assert(<1,2,3>==<1,2,3>);

I'd like this test to be as fast as string comparisons.
I'd want tuples to work well as table keys (easy if they have cheap hashing and equality).
This would require tuples to be immutable, and to only accept to contain immutable values.

Since they won't be mutable, the base library working on them should be purely functional. Doing functional stuff with table always feels dangerous, because functional invariants break down as soon as someone alters a table's content.

This means that it wouldn't be suitable as a type for "...". However, I believe it's a non-issue: non-trivial manipulations of "..." are easy if you put everything in a table "{...}"; and I'd be very curious to be shown any profiling result which pinpoints this "{...}" idiom as a cause of real-life inadequate performances...

It's possible to write a tuple constructor satisfying your goals about comparison. Construction is linear in the number of elements, but the constant factor is reasonably high.

Here is a quick sketch of how to implement a "hash cons". General tuples can then be built from this.

A "hash cons" pair is represented as a table or closure referencing the cons'd values. In addition, it will need to hold one back pointer as we will see in a moment.

To ensure that cons( a, b ) always produces the same result for the same input values a and b (essential if we're going to use them as table keys), we use a cascaded lookup through weak tables.

T1 is a fully weak map from values to tables T2. We look up the first parameter in T1 creating a new table if none exists.

T2 is then used to map the second parameter to the actual tuple.

So, if everything were already constructed, we would have:

function cons( a, b ) return T1[ a ][ b ] end

But now we actually need to make it work and this takes addressing a number of details.

First, nil and NaN don't work as keys. One could debate how NaN is supposed to work with tuples, but essentially we need a function "safekey" that remaps these to magic values that won't be used anywhere outside the tuple implementation:

function cons( a, b ) return T1[ safekey( a ) ][ safekey( b ) ] end

Next we get to the need for a back link. If cons( a, b ) is still alive, then we need T1[ safekey( a ) ] to stay alive in case we were to invoke cons( a, b ) again. We handle this by having a back link from the cons pair to the second level table.

For security, the cons pair should be implemented to be immutable and the back link should be kept private. This pushes toward implementing the tuple as a closure though if space is a concern it would be better as a C closure.

Finally, as I noted above given a "hash cons", it is straightforward to implement a general "hash tuple" either by stacking up cons pairs appropriately or by generalizing the implementation to support more intermediate tables (and associated back links).

Mark

P.S. While we're at it, given the weak tables here, it's probably worthwhile to see the recent finalization discussion regarding the dangers of mixing finalization and weak tables.