[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Safe user-facing pattern matching.
- From: Martin <eden_martin_fuhrspam@...>
- Date: Sun, 23 Apr 2017 09:06:28 -0700
On 04/22/2017 08:51 PM, Coda Highland wrote:
> With a minimally greedy .- so that %b() matches the largest number of
> parentheses possible, it's still possible to do it in linear time:
It looks like that matching ".-%b" is almost equivalent to matching "%b"
which is done in linear time.
> 1. Find the first instance of ")" preceded by a "(". Return an empty
> match if not found.
> 2. Call the position of ( "left" and the position of ) "right". Set
> depth = 0 and lastGood = right.
> 3. Increment right.
> 4. If right goes off the end of the string, return [left ... lastGood].
> 5. If *right is not ( or ), return [left ... lastGood].
> 6. If *right == (, increment depth.
> 7. If *right == ), decrement depth.
> 8. If depth == 0, lastGood = right.
> 9. If depth < 0:
> 9a. Decrement left.
> 9b. If left < 0, return [0 ... lastGood]
> 9c. If *left == (, increment depth, lastGood = right.
> 9d. Else, return [left+1 ... lastGood].
> 10. Go to 3.
>
> No recursion, no arbitrary backtracking, not even a stack! No character
> in the input is evaluated more than three times.
Nice. If I understand correctly, idea is to find first occurrence of
"()" and then expand right in case of "(" or ")" and left in case of
unbalanced ")".
Concerning pseudocode, how it will handle "(()()" string? Looks like
it will return "()()" instead of first "()".
I think this may be fixed by expanding left each turn when <depth> was
zero at start of turn. Also we should save <lastGoodRight> and
<lastGoodLeft> each time when <depth> was zero at start of turn.
(Moving to left from first occurence of "()" we await to see only "("
chars. In other cases that char at left does not belong to balanced
parenthesis string.)
So pseudocode may be written as
1. Find the first instance of ")" preceded by a "(". Return an empty
match if not found.
2. Call the position of ( "left" and the position of ) "right". Set
depth = 0.
3.
while true
if (depth == 0)
lastGoodLeft = left
lastGoodRight = right
left = left - 1
depth = depth + 1
right = right + 1
if
(left < 1) or
(right > #s) or
(s[left] ~= '(') or
((s[right] ~= '(') and (s[right] ~= ')'))
then
return lastGoodLeft, lastGoodRight
if (s[right] == '(')
depth = depth + 1
else
depth = depth - 1
> On Sat, Apr 22, 2017 at 1:40 PM, Martin <eden_martin_fuhrspam@gmx.de
<mailto:eden_martin_fuhrspam@gmx.de>> wrote:
>> Yes, AFAIK it's impossible to match ".*%b()" in linear time. That
>> message was regarding FSA to match "%b()".
On 04/22/2017 08:51 PM, Coda Highland wrote:
> the problem becomes trivial: find the first instance of ")" preceded by
> a "(" and return that. :P
So I was wrong. It's possible to match ".*%b()" in linear time.
It is almost equivalent of matching "()". Thank you for showing it!
-- Martin