lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

I have just realised that I don't think I ever posted this patch to
the list.  I have been using a modified behaviour for the handling of
the start anchor '^' in `gmatch` and `gsub`.  Essentially I use these
to mean that each match must immediately follow the preceding match.
I find this very useful for ensuring that no unmatched characters are
skipped by the pattern.  This also lets me terminate patterns when a
section delimiter is encountered in the string (which doesn't match
the pattern).

This is a breaking change, but the old behaviour is still accessible.

For the case of `gmatch` this only changes the semantics slightly
where previously an initial '^' character was taken as a literal '^'
so where the existing behaviour is desired it is just a matter of
escaping the '^' as would be necessary in other patterns.

For the case of `gsub` the start anchor currently works, but anchors
against the start of the string only which prevents iteration over the
string.  This same behaviour can still be achieved by passing the 4th
argument which can restrict the match to a single occurrence only.
This means for example `string.gsub("hello", "^%a", string.upper)` to
uppercase the first character in a string would instead become
`string.gsub("hello", "^%a", string.upper, 1)`.

Anyway, I hope someone finds this useful, and it would be nice if this
made it into a future release.


Index: lstrlib.c
@@ -674,13 +674,16 @@ static int gmatch_aux (lua_State *L) {
   GMatchState *gm = (GMatchState *)lua_touserdata(L, lua_upvalueindex(3));
   const char *src;
   gm->ms.L = L;
+  int anchor = (*(gm->p) == '^') ? 1 : 0;
   for (src = gm->src; src <= gm->ms.src_end; src++) {
     const char *e;
-    if ((e = match(&gm->ms, src, gm->p)) != NULL && e != gm->lastmatch) {
+    if ((e = match(&gm->ms, src, gm->p+anchor)) != NULL && e != gm->lastmatch) {
       gm->src = gm->lastmatch = e;
       return push_captures(&gm->ms, src, e);
+    else if (anchor)
+      break;
   return 0;  /* not found */
@@ -786,10 +789,11 @@ static int str_gsub (lua_State *L) {
       add_value(&ms, &b, src, e, tr);  /* add replacement to buffer */
       src = lastmatch = e;
+    else if (anchor)  /* missed anchored match */
+      break;
     else if (src < ms.src_end)  /* otherwise, skip one character */
       luaL_addchar(&b, *src++);
     else break;  /* end of subject */
-    if (anchor) break;
   luaL_addlstring(&b, src, ms.src_end-src);