[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: patch for string library
- From: RLake@...
- Date: Thu, 20 May 2004 15:19:10 -0400
I don't think this is a bug. I think it is expected
behaviour. (At least,
it is the behaviour I would expect.)
> 2. Patterns which can match with empty string
may matches twice
> at same position.
> For example,
> > = string.gsub("abc", ".*", "x")
> xx 2
> > = string.gsub("12ab", "%a*$", "x")
> 12xx 2
> These results should be "x 1" and "12x 1".
Why? The two matches are not in the same position.
In the first
case, for example, the first time through "abc"
is matched at
position 1; the second time through "" is
matched at position 4.
If you didn't want that behaviour, you should use
".+".
My guess is that you were trying to do something like
this:
string.gsub(str, "([^\r\n]*)\r?\n?",
"%1\n")
in an attempt to normalise line-endings, and found
that the strings
which already were terminated with a line-end now
have two. One way
to actually accomplish this is:
string.gsub(str, "([^\r\n]*)(\r?\n?)",
function(line,
ending)
if ending == "" then
return line
else
return line .. "\n"
end)
or "simply":
string.gsub(str, "([^\r\n]*)(\r?\n?)",
function(line,
ending)
return line .. (ending == "" and "" or "\n")
end)
Another option is:
string.gsub(str, "[\r\n]+",
function(endings)
local _, e = string.find(endings,
"\r?\n?")
return string.rep("\n",
string.len(endings) / e)
end)
I'm sure there are others.
The second case seems a little wierder, if you think
that $ means
"match the terminator". It doesn't, though.
It is, in perl-speak,
a zero-length assertion, in perl-speak: it requires
the match to
end at the last character of the string. In this case,
you could
get what I think you want with:
string.gsub(str, "%a*$", "x",
1)
In general, gsubs with zero-length matches need to
be done cautiously;
and should be avoided if possible.