lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Fri, Oct 17, 2008 at 5:10 AM, Shmuel Zeigerman wrote:
> By assigning string.sub to the __call metamethod of string type, one
> would be able to say var(i,j). Would it make sense to have it in Lua 5.2 ?

For any aggregate data structure, extracting its members is a very
basic and necessary operation that many other operations can be
defined in terms of.  In indexing an array, tuple, or sequence, a_{k}
or a[k], or even applying a function or relation, f(k), there is a
short mathematical notation for this.  Here, Lua strings are
understood as sequences of chars or bytes.

Strings are a basic type in Lua, while the string library is (in
theory) optional.  The operations on strings could be divided between
those that can be performed without the string library and those that
cannot.  String operations that can be performed without the string
library include obtaining the length of a string (#s), concatenating
two strings (s1 .. s2), testing for equality (s1 == s2), comparing
lexographically (s1 < s2), and converting a string to a number (s +
0).  Those are built-in operations, which can't even be redefined in
the string metatable (although __len might be redefinable in 5.2 [3]).
 The notable exception here is extracting the i-th character or the
character range i through j in a string[6].  (I'll treat these two
operations equivalently here since each can be implemented in terms of
the other.)  This is odd because most of the string library and most
any operation you might want could be defined in terms of string
indexing[1].  Even some of the built-in string operations such as
comparing, converting a string to a number, and perhaps obtaining the
length of a string are not so fundamental since they could be
redefined in terms of string indexing.

In, say, Python, strings are thought of as immutable tuples of chars
that can be indexed and sliced[4], so you have a notation like s[1],
s[-1], s[1:4], and s[2:].  That notation does pose a bit of a
challenge in Lua since s[i : f()] would normally be interpreted as a
method call in Lua rather than a range.  Shmuel suggested s(i, j).
Syntax I think is a secondary decision whose details can be worked out
later.

This should be considered in relation to the patch for indexing
strings with the "[]" operator[2].  If the notation "s(i,j)" is used
for string slicing, I think we should reconsider whether that should
actually be evaluated by the VM as a Lua function call defined by the
string object metatable rather than evaluated directly as a built-in
operation that accidentally uses the same syntax as a function call.
Should we also allow varargs like s(...)?  Using a builtin operation
seems more consistent with Lua's treatment of operations on primitive
types.  See the below rough patch for the general idea.

Related topic: indexing varargs currently is not built-in operation
either but is rather a standard function (select).  Others have
pointed out its sometimes awkward execution behavior[5].

[1] http://lua-users.org/lists/lua-l/2008-09/msg00383.html
[2] http://lua-users.org/lists/lua-l/2008-09/msg00416.html
[3] http://lua-users.org/wiki/LuaFiveTwo
[4] http://en.wikibooks.org/wiki/Python_Programming/Strings#Indexing_and_Slicing
[5] http://lua-users.org/lists/lua-l/2005-09/msg00429.html
[6] http://en.wikipedia.org/wiki/String_manipulation_algorithm#substring

diff -ur lua-5.1.4/src/ldo.c lua-5.1.4-stringcall/src/ldo.c
--- lua-5.1.4/src/ldo.c	2008-01-18 17:31:22.000000000 -0500
+++ lua-5.1.4-stringcall/src/ldo.c	2008-10-18 12:26:45.171875000 -0400
@@ -261,9 +261,47 @@
    (condhardstacktests(luaD_reallocCI(L, L->size_ci)), ++L->ci))


+static ptrdiff_t posrelat (ptrdiff_t pos, size_t len) {
+  /* relative string position: negative means back from end */
+  if (pos < 0) pos += (ptrdiff_t)len + 1;
+  return (pos >= 0) ? pos : 0;
+}
+
+static void str_sub (lua_State *L, StkId str) {
+  size_t l = tsvalue(str)->len;
+  const char *s = getstr(rawtsvalue(str));
+  TValue * vstart = str+1;
+  TValue * vend = str+2;
+  ptrdiff_t start;
+  ptrdiff_t end;
+  if (ttype(vstart) != LUA_TNUMBER) luaG_runerror(L, "first index not
a number");
+  start = posrelat(nvalue(vstart), l);
+  if (L->top - str < 3 || ttisnil(vend)) {
+    end = posrelat(-1, l);
+  }
+  else if (tonumber(vend,vend)) {
+    end = posrelat(nvalue(vend), l);
+  }
+  else luaG_runerror(L, "second index not a number");
+
+  if (start < 1) start = 1;
+  if (end > (ptrdiff_t)l) end = (ptrdiff_t)l;
+  L->top = str;
+  if (start <= end) {
+    setsvalue2s(L, L->top, luaS_newlstr(L, s+start-1, end-start+1));
+  }
+  else setsvalue2s(L, L->top, luaS_newlstr(L, "", 0));
+}
+
 int luaD_precall (lua_State *L, StkId func, int nresults) {
   LClosure *cl;
   ptrdiff_t funcr;
+
+  if (ttisstring(func)) {
+    str_sub(L, func);
+    return PCRC;
+  }
+
   if (!ttisfunction(func)) /* `func' is not a function? */
     func = tryfuncTM(L, func);  /* check the `function' tag method */
   funcr = savestack(L, func);