[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Pattern matching bug
- From: Grant Robinson <jgrantr@...>
- Date: Wed, 8 Apr 2009 16:09:35 -0600
On Wed, Apr 8, 2009 at 4:03 PM, Roberto Ierusalimschy
<roberto@inf.puc-rio.br> wrote:
>> Consider the following patterns:
>> "(%w+)=([\"'])([^%2]-)%2"
>> "([\"'])([^%1]-)%1"
>>
>> If the first pattern is applied against the following strings:
>> 1) "key1='value1'"
>> 2) "key2='value2'"
>>
>> It will match #1 and not #2. Similarly, for the 2nd pattern, if it is
>> applied against those same two strings, it will match #2, and not #1.
>>
>> The trouble is in the sequence "([^%2]-)" which should be
>> "non-greedily match anything except the characters contained in
>> capture 2".
>> It appears that is actually doing "non-greedily match anything except
>> the characters contained in capture 2 and the actual number 2".
>>
>> Thoughts? I have already found a workaround, but this pattern is
>> syntactically valid, it just doesn't work in the expected manner. I
>> took a look through lstrlib.c, but it was not immediately obvious
>> where the problem might be.
>
> It does not work the mannter you expect, but it does what the manual
> says it does. The "%2" you want is a 'pattern item', but there are no
> items inside character classes. Inside a class the '%' only escapes
> characters.
That makes sense. I guess I didn't realize that there are no pattern
items inside a set aside from the built-in character classes (%w and
the like). It seems like it would be useful to have this ability.
What do others think?
Grant