lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


"Rici Lake" <lua@ricilake.net> wrote:
(22/02/2005 13:53)

>
>crow said:
>> DATA=string.gsub(DATA,"\r\n\r\n(Author:.-\r\nEmail:)","\r\n\r\n××%1")
>> [/code]
>>
>> That looks like it will work, as it anchors the pattern to the double
>> newline, then "Author:", then non-greedy match to end of line, then
>> "Email:".
>
>I don't think you have quite the correct interpretation of non-greedy.
>Non-greedy is not a "fence" operator. The .- in that regex will not stop
>just because a \r is matched. It means "the shortest match which matches
>the pattern", so if the next line after Author: ... is not Email:, it will
>keep matching until it reaches an Email: line.
>
>Your "workaround" of changing . to [^\r] is actually the correct solution:
>i.e., if you don't want to match newlines, you have to say that
>explicitly.
>
>To put it another way, both "greedy" and "non-greedy" matches will find
>the earliest match for the entire pattern; the difference is that "greedy"
>matches the longest match at that point while "non-greedy" matches the
>shortest one. But in no case will a match be ignored.
>
>Hope that helps,
>Rici



Thankyou. :)
Please look again though...

"[quote]
It means "the shortest match which matches
the pattern", so if the next line after Author: ... is not Email:, it will
keep matching until it reaches an Email: line.
[/quote]

You are right there, but the point is, it HAD found the explicit match of "Email:" and continued to look further, or at least, did something happened that appeared to be this.

Where the post contained a two quotes, it was still a post with intact header. It failed to stop at the Email: tag. It captured up to the first quote, where Author: .. is followed on the next line by Date:
This quote was the content of the next capture, which extended to include the following post.