lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



I don't think this is a bug. I think it is expected behaviour. (At least,
it is the behaviour I would expect.)

> 2. Patterns which can match with empty string may matches twice
> at same position.
> For example,
> > = string.gsub("abc", ".*", "x")
> xx      2
>  > = string.gsub("12ab", "%a*$", "x")
> 12xx    2

> These results should be "x  1" and "12x  1".

Why? The two matches are not in the same position. In the first
case, for example, the first time through "abc" is matched at
position 1; the second time through "" is matched at position 4.
If you didn't want that behaviour, you should use ".+".

My guess is that you were trying to do something like this:
  string.gsub(str, "([^\r\n]*)\r?\n?", "%1\n")
in an attempt to normalise line-endings, and found that the strings
which already were terminated with a line-end now have two. One way
to actually accomplish this is:
  string.gsub(str, "([^\r\n]*)(\r?\n?)",
              function(line, ending)
                if ending == "" then
                  return line
                 else
                  return line .. "\n"
                end)

or "simply":
  string.gsub(str, "([^\r\n]*)(\r?\n?)",
              function(line, ending)
                return line .. (ending == "" and "" or "\n")
              end)
Another option is:
  string.gsub(str, "[\r\n]+",
   function(endings)
     local _, e = string.find(endings, "\r?\n?")
     return string.rep("\n", string.len(endings) / e)
  end)

I'm sure there are others.

The second case seems a little wierder, if you think that $ means
"match the terminator". It doesn't, though. It is, in perl-speak,
a zero-length assertion, in perl-speak: it requires the match to
end at the last character of the string. In this case, you could
get what I think you want with:

  string.gsub(str, "%a*$", "x", 1)

In general, gsubs with zero-length matches need to be done cautiously;
and should be avoided if possible.