Splay Ropes

lua-users home
wiki

Ropes are binary trees each of whose nodes contains a string; the actual string can be read off of the rope with an in-order traverse.

This implementation uses functional splay trees to keep the ropes balanced; the implementation of splay trees was partially inspired by Chris Okasaki's excellent book on pure functional datatypes.

The nodes are implemented here with tuples, implemented with a tiny addition to the Lua base library:

static int l_tuple(lua_State *L) {
   lua_pushcclosure(L, lua_pushupvalues, lua_gettop(L));
   return 1;
}

/* also add {"tuple", l_tuple} to luaL_reg base_funcs[] */

If you don't want to recompile Lua, you can implement the tuple function as follows:

local tuplemeta = {__call = table.unpack}
function tuple(...) return setmetatable(arg, tuplemeta) end

It will be slower and use more memory, but it is pure Lua.

The rope library really got too big to fit on a Wiki, so I put it in the files area, along with a little Knuth Morris Pratt search/replace function as an example of how to use ropes (although character by character searching in Lua is not fast.)

To load the libraries, use LTN-11 style: (local) rope = dofile"rope.lua"()

Download the libraries here: Files:wiki_insecure/users/rici/rope.lua and here: Files:wiki_insecure/users/rici/ropesearch.lua

Ropes are particularly useful for large text strings where there are random changes at localised points; that would make them ideal to implement a text editor, for example. Functional splay trees also have the interesting feature that you can simply keep a reference to the rope at any time, and later "return" to it; consequently maintaining an undo history is trivial. A single localised change is not likely to create more than a few splay nodes.

The splay ropes are implemented using an interval map: each node in the tree contains a left and right pointer, a string, and the relative position of the end of the string (this could have been the relative position of the beginning of the string or the total size of the node plus all its descendants; the three formulations are computationally equivalent.) This means that a node can be in more than one rope, or even more than one place in the same rope -- an extreme example of that is shown by the implementation of rope.rep, which uses intensive node-sharing to allow it to create ropes of size much greater than RAM: rope.rep("a", 1E40) takes a few milliseconds and uses about 10 kb, but it really is the representation of a string with 10^40 characters. IMHO, this is the sort of datastructure you end up with when you embrace immutable objects and garbage collection, instead of trying to implement C datatypes in a functional garbage collected environment.

Another interesting fact about interval maps is that two of them can be kept with synchronised indexes, without sharing the same topology. The hypothetical text editor could, for example, keep text style information in another interval map, simply by executing the same insert, delete and replace operations on a stylerun interval map, even if the stylerun interval map was consolidating consecutive identical styles.

Splay trees are useful in this sort of application because they do not need to be rebalanced after arbitrary deletion or insertion operations. On the other hand, the fact that the splay tree is self-adjusting means that you get a new splay object even after a lookup, not just after a mutation, and the performance of the splay tree depends on using the new object for subsequent lookups. For this reasons, splay trees are usually considered to be inappropriate for functional programs, particularly those which depend on persistence (or time travel, if you prefer); however, I believe that the text editor application would demonstrate that this is not actually true.


RecentChanges · preferences
edit · history
Last edited April 13, 2004 11:16 pm GMT (diff)