lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi,

it seems to be a common idiom to use string.gsub in combination
with a substitution table. Currently this requires adding a
trivial function every time:

  local subst = { foo = "abc", bar = "xyz" }

  local function substfunc(c)
    return subst[c]
  end

  local result = string.gsub(str, "%w+", substfunc)

Or more commonly:

  local function substfunc(c)
    return subst[c] or c
  end

[This is more common mainly because Lua's regexp patterns don't
allow '|'. Also it's easier and faster to widen the match and do
an identity substitution instead of narrowing the match. Captures
complicate the issue because you may need to reconstruct the
whole match to avoid loosing the surrounding match context.]

Using an anonymous function and/or an anonymous table is
inefficient because these need to be recreated every time. So
inevitably you end up with an additional dummy function in an
outer scope. This is both inconvenient and error-prone. I think
I've written dozens incantations of this idiom and finally got
annoyed enough to search for better options.


I think it would be much simpler if string.gsub would support a
substitution table out-of-the-box. And looking over the code this
is really a trivial change.

The appended patch (relative to Lua 5.1-alpha) allows you to
directly specify a substitution table instead of the substitution
string or substitution function for string.gsub. The table key is
derived from the first capture or the whole match (if there are
no captures). The above example can now be simplified to:

  local subst = { foo = "abc", bar = "xyz" }

  local result = string.gsub(str, "%w+", subst)

It has the behaviour of the second variant above. I.e. strings
that are not found in the table keep the whole match (and not
just the capture). Any non-string/non-nil value is treated like
an empty replacement string.

You can do interesting things by chaining tables or adding an
__index metamethod (e.g. a fast substitution cache).

Apart from the simpler usage it's also around 2 times faster than
the old style that needs to call a function every time.

Bye,
     Mike
--- lua-5.1-alpha/src/lstrlib.c	2005-08-26 19:36:32 +0200
+++ lua-5.1-alpha-gsubtab/src/lstrlib.c	2005-10-15 17:08:09 +0200
@@ -588,7 +588,8 @@
 static void add_s (MatchState *ms, luaL_Buffer *b,
                    const char *s, const char *e) {
   lua_State *L = ms->L;
-  if (lua_isstring(L, 3)) {
+  switch (lua_type(L, 3)) {
+  case LUA_TSTRING: case LUA_TNUMBER: {
     size_t l;
     const char *news = lua_tolstring(L, 3, &l);
     size_t i;
@@ -610,8 +611,9 @@
         }
       }
     }
+    break;
   }
-  else {  /* is a function */
+  case LUA_TFUNCTION: {
     int n;
     lua_pushvalue(L, 3);
     n = push_captures(ms, s, e);
@@ -620,6 +622,28 @@
       luaL_addvalue(b);  /* add return to accumulated result */
     else
       lua_pop(L, 1);  /* function result is not a string: pop it */
+    break;
+  }
+  case LUA_TTABLE: {
+    if (ms->level == 0)
+      lua_pushlstring(L, s, e - s);  /* key is whole match */
+    else
+      push_onecapture(ms, 1);  /* or first capture */
+    lua_gettable(L, 3);
+    switch (lua_type(L, -1)) {
+    case LUA_TSTRING: case LUA_TNUMBER:
+      luaL_addvalue(b);  /* add string to accumulated result */
+      break;
+    case LUA_TNIL:
+      lua_pop(L, 1);
+      luaL_addlstring(b, s, e - s);  /* keep the match */
+      break;
+    default:
+      lua_pop(L, 1);  /* no replacement */
+      break;
+    }
+    break;
+  }
   }
 }
 
@@ -633,9 +657,10 @@
   int n = 0;
   MatchState ms;
   luaL_Buffer b;
-  luaL_argcheck(L,
-    lua_gettop(L) >= 3 && (lua_isstring(L, 3) || lua_isfunction(L, 3)),
-    3, "string or function expected");
+  int trepl = lua_type(L, 3);
+  luaL_argcheck(L, trepl == LUA_TSTRING || trepl == LUA_TNUMBER ||
+    trepl == LUA_TTABLE || trepl == LUA_TFUNCTION,
+    3, "string, function or table expected");
   luaL_buffinit(L, &b);
   ms.L = L;
   ms.src_init = src;