[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Pattern matching: good practices ?
- From: Ted Unangst <ted.unangst@...>
- Date: Sat, 26 Dec 2009 10:52:44 -0500
On Sat, Dec 26, 2009 at 4:39 AM, Vaughan McAlley <email@example.com> wrote:
> So is there a simple way of knowing when a pattern will (or might) backtrack?
Any pattern that uses * or +, then follows it with another pattern
that can match something the first one matched. .* is likeliest to
cause trouble because whatever comes after .* is guaranteed to also
match the .*, but patterns like "%s+ " can be trouble on long strings
The basic rule to follow is that for any two adjacent patterns, you
want any character of the input string to match either one or neither,
but never both.
If you really need to match "variable whitespace, followed by space,
followed by X", as in "%s* X", and you are concerned about
performance, you can rewrite to be "(%s+)X" and then check after the
fact that your whitespace ends in a space. This is a little slower in
the best case, but has deterministic performance for worst case input