Splay Ropes |
|
The rope library:--------------rope.lua------------ And here is a little library that uses it: |
|
If you don't want to recompile Lua, you can implement the tuple function as follows: |
|
---------kmp.lua----------- -- -- A small example of using ropes -- (and maybe not a very good one) |
|
local tuplemeta = {__call = table.unpack} function tuple(...) return setmetatable(arg, tuplemeta) end }}} |
|
local slen, ssub = string.len, string.sub |
|
It will be slower and use more memory, but it is pure Lua. |
|
return function() -- returns a function which matches p; it accepts characters one -- at a time, and returns the length of the match (i.e. string.len(p)) -- when it is fed the last matching character. -- The algorithm was taken more or less straight out of Sedgewick, -- adjusted to be 1-based and to be inline rather than a string scan. -- There are ways to make this work with wildcards and even intervals, -- but this is just an example. local function kmp(p) -- We compute the transition vector and other goodies, and then -- return a custom matcher function. It is convenient to also -- split the pattern into a table local P, M = {}, slen(p); for i = 1, M do P[i] = ssub(p, i, i) end local i, j, next = 1, 0, {0} while i < M do while j > 0 and P[i] ~= P[j] do j = next[j] end i, j = i + 1, j + 1 next[i] = (P[i] == P[j] and next[j]) or j end |
|
The rope library really got too big to fit on a Wiki, so I put it in the files area, along with a little Knuth Morris Pratt search/replace function as an example of how to use ropes (although character by character searching in Lua is not fast.) |
|
j = 1 -- used as closure return function(ch) while j > 0 and ch ~= P[j] do j = next[j] end j = j + 1 if j > M then j = 1; return M end end end |
To load the libraries, use LTN-11 style: (local) rope = dofile"rope.lua"() |
|
-- Given a rope r, a pattern string p and a replacement string q, -- returns a rope with all occurrences of p to q. -- (p and q are strings, not ropes, -- but that could be fixed easily enough.) local function replaceAll(r, p, q) local rv = r local scanner = kmp(p) local j = 0 for _, seg in rope.segs(r) do for i = 1, slen(seg) do local m = scanner(ssub(seg, i, i)) if m then rv = rope.replace(rv, j + i - m + 1, j + i, q) end end j = j + slen(seg) end return rv end |
|
Download the libraries here: Files:wiki_insecure/users/rici/rope.lua and here: Files:wiki_insecure/users/rici/ropesearch.lua |
|
local sr = { kmp = kmp, replaceAll = replaceAll } return sr end }}} |
|
Ropes are particularly useful for large text strings where there are random changes at localised points; that would make them ideal to implement a text editor, for example. Functional splay trees also have the interesting feature that you can simply keep a reference to the rope at any time, and later "return" to it; consequently maintaining an undo history is trivial. A single localised change is not likely to create more than a few splay nodes. |
The splay ropes are implemented using an interval map: each node in the tree contains a left and right pointer, a string, and the relative position of the end of the string (this could have been the relative position of the beginning of the string or the total size of the node plus all its descendants; the three formulations are computationally equivalent.) This means that a node can be in more than one rope, or even more than one place in the same rope -- an extreme example of that is shown by the implementation of rope.rep, which uses intensive node-sharing to allow it to create ropes of size much greater than RAM: rope.rep("a", 1E40) takes a few milliseconds and uses about 10 kb, but it really is the representation of a string with 10^40 characters. IMHO, this is the sort of datastructure you end up with when you embrace immutable objects and garbage collection, instead of trying to implement C datatypes in a functional garbage collected environment. |
To load the libraries, use LTN-11 style: (local) rope = dofile"rope.lua"() |
Another interesting fact about interval maps is that two of them can be kept with synchronised indexes, without sharing the same topology. The hypothetical text editor could, for example, keep text style information in another interval map, simply by executing the same insert, delete and replace operations on a stylerun interval map, even if the stylerun interval map was consolidating consecutive identical styles. |
An interesting feature of the implementation of rope.rep is its intensive node-sharing, possible only because of the use of functional nodes and the existence of a garbage collector. rope.rep can create truly enormous strings: try a = rope.rep("a", 1E40) which occupies about 10k. |
|
Splay trees are useful in this sort of application because they do not need to be rebalanced after arbitrary deletion or insertion operations. On the other hand, the fact that the splay tree is self-adjusting means that you get a new splay object even after a lookup, not just after a mutation, and the performance of the splay tree depends on using the new object for subsequent lookups. For this reasons, splay trees are usually considered to be inappropriate for functional programs, particularly those which depend on persistence (or time travel, if you prefer); however, I believe that the text editor application would demonstrate that this is not actually true. |
This implementation uses functional splay trees to keep the ropes balanced; the implementation of splay trees was partially inspired by Chris Okasaki's excellent book on pure functional datatypes.
The nodes are implemented here with tuples, implemented with a tiny addition to the Lua base library:
static int l_tuple(lua_State *L) {
lua_pushcclosure(L, lua_pushupvalues, lua_gettop(L));
return 1;
}
/* also add {"tuple", l_tuple} to luaL_reg base_funcs[] */
If you don't want to recompile Lua, you can implement the tuple function as follows:
local tuplemeta = {__call = table.unpack}
function tuple(...) return setmetatable(arg, tuplemeta) end
It will be slower and use more memory, but it is pure Lua.
The rope library really got too big to fit on a Wiki, so I put it in the files area, along with a little Knuth Morris Pratt search/replace function as an example of how to use ropes (although character by character searching in Lua is not fast.)
To load the libraries, use LTN-11 style: (local) rope = dofile"rope.lua"()
Download the libraries here: Files:wiki_insecure/users/rici/rope.lua and here: Files:wiki_insecure/users/rici/ropesearch.lua
Ropes are particularly useful for large text strings where there are random changes at localised points; that would make them ideal to implement a text editor, for example. Functional splay trees also have the interesting feature that you can simply keep a reference to the rope at any time, and later "return" to it; consequently maintaining an undo history is trivial. A single localised change is not likely to create more than a few splay nodes.
The splay ropes are implemented using an interval map: each node in the tree contains a left and right pointer, a string, and the relative position of the end of the string (this could have been the relative position of the beginning of the string or the total size of the node plus all its descendants; the three formulations are computationally equivalent.) This means that a node can be in more than one rope, or even more than one place in the same rope -- an extreme example of that is shown by the implementation of rope.rep, which uses intensive node-sharing to allow it to create ropes of size much greater than RAM: rope.rep("a", 1E40) takes a few milliseconds and uses about 10 kb, but it really is the representation of a string with 10^40 characters. IMHO, this is the sort of datastructure you end up with when you embrace immutable objects and garbage collection, instead of trying to implement C datatypes in a functional garbage collected environment.
Another interesting fact about interval maps is that two of them can be kept with synchronised indexes, without sharing the same topology. The hypothetical text editor could, for example, keep text style information in another interval map, simply by executing the same insert, delete and replace operations on a stylerun interval map, even if the stylerun interval map was consolidating consecutive identical styles.
Splay trees are useful in this sort of application because they do not need to be rebalanced after arbitrary deletion or insertion operations. On the other hand, the fact that the splay tree is self-adjusting means that you get a new splay object even after a lookup, not just after a mutation, and the performance of the splay tree depends on using the new object for subsequent lookups. For this reasons, splay trees are usually considered to be inappropriate for functional programs, particularly those which depend on persistence (or time travel, if you prefer); however, I believe that the text editor application would demonstrate that this is not actually true.