lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On May 3, 2012, at 3:02 PM, Emeka wrote:

> I stumbled on the above while going through Lua manual... I need your help to here.

The Lua manual explains "what", but it does not explain "why". _Programming in Lua_ often does explain "why" and I recommend it.

>      x = string.gsub("4+5 = $return 4+5$", "%$(.-)%$", function (s)
>            return loadstring(s)()
>          end)

Similar to "*", the "-" quantifier means "zero or more, but as few as possible." It is common when iterating over delimited patterns. Let's look at the difference between ".*" and ".-" here:

  > sub = "4 + 5, 3 + 4 = $return 4+5$, $return 3+4$"

  > s_star ="%$(.*)%$"
  > s_minus="%$(.-)%$"

  > =string.match(sub, s_star)
  return 4+5$, $return 3+4

  > =string.match(sub, s_minus)
  return 4+5

In the case of single-character delimiters, this can be rewritten in terms of greedy matching everything but the terminating character:

  > s_invclass ="%$([^$]*)%$"

  > =string.match(sub, s_invclass)
  return 4+5

But this would not work if you wished to match something like

  > sub2 = "4 + 5 = ${return 4+5}$"

since you can't invert a sequence--only a character set. Ending a capture with a sequence works fine with .- though:

  > s_minus2="%${(.-)}%$"

  > =string.match(sub2, s_minus2)
  return 4+5

Because UTF-8 codepoints are more than one Lua character long, non-greedy matching is necessary for field matches terminating at a non-ASCII codepoint.

Jay