[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: [Possibly Spam] Possible bug with non-greedy matching in gsub
- From: crow <crow_64a@...>
- Date: Tue, 22 Feb 2005 13:46:35 -0000
"Rici Lake" <email@example.com> wrote:
>> That looks like it will work, as it anchors the pattern to the double
>> newline, then "Author:", then non-greedy match to end of line, then
>I don't think you have quite the correct interpretation of non-greedy.
>Non-greedy is not a "fence" operator. The .- in that regex will not stop
>just because a \r is matched. It means "the shortest match which matches
>the pattern", so if the next line after Author: ... is not Email:, it will
>keep matching until it reaches an Email: line.
>Your "workaround" of changing . to [^\r] is actually the correct solution:
>i.e., if you don't want to match newlines, you have to say that
>To put it another way, both "greedy" and "non-greedy" matches will find
>the earliest match for the entire pattern; the difference is that "greedy"
>matches the longest match at that point while "non-greedy" matches the
>shortest one. But in no case will a match be ignored.
>Hope that helps,
Please look again though...
It means "the shortest match which matches
the pattern", so if the next line after Author: ... is not Email:, it will
keep matching until it reaches an Email: line.
You are right there, but the point is, it HAD found the explicit match of "Email:" and continued to look further, or at least, did something happened that appeared to be this.
Where the post contained a two quotes, it was still a post with intact header. It failed to stop at the Email: tag. It captured up to the first quote, where Author: .. is followed on the next line by Date:
This quote was the content of the next capture, which extended to include the following post.