lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Apr 22, 2017 8:35 AM, "Martin" <eden_martin_fuhrspam@gmx.de> wrote:
On 04/22/2017 08:01 AM, Coda Highland wrote:
> On Sat, Apr 22, 2017 at 1:40 PM, Martin <eden_martin_fuhrspam@gmx.de> wrote:
>> Yes, AFAIK it's impossible to match ".*%b()" in linear time. That
>> message was regarding FSA to match "%b()".
>>
>> This is due dreaded left recursion: in ('(((((((())'):match('.*%b()')
>> for any unprocessed char we don't know whether it belongs to "%b()" or
>> to ".*" part.
>
> Nope, you can match it in linear time with a stack and a counter:
>
> 1. Push every character onto a stack until you hit the end of the string.
> 2. If the stack is empty, end.
> 3. Pop a character.
> 4. If the character was not '(' or ')', end.
> 4. If the character was ')', increment the counter by 1.
> 5. If the character was '(', decrement the counter by 1.
> 6. If the counter is negative, end.
> 7. Go to 2.
>
> /s/ Adam

Your algorithm matches with pattern ".-%b()$", not ".*%b()". It starts
with end of string and moves to beginning correctly capturing balanced
brackets in linear time.

  $ lua
  Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
  -- Added captures to pattern to highlight parts.
  > ('(((((((())'):match('(.*)(%b())')  -- original case, orig. string
  (((((((  ()
  > ('(((((((())'):match('(.-)(%b())$')  -- your case, original string
  ((((((  (())
  > ('(((((((())a'):match('(.*)(%b())')  -- original case, new string
  (((((((  ()
  > ('(((((((())a'):match('(.-)(%b())$')  -- your case, new string
  nil

-- Martin

Oh, good point. I forgot about putting stuff afterward. But with the clarification that .* is maximally greedy and %b is minimally greedy, the problem becomes trivial: find the first instance of ")" preceded by a "(" and return that. :P

With a minimally greedy .- so that %b() matches the largest number of parentheses possible, it's still possible to do it in linear time:

1. Find the first instance of ")" preceded by a "(". Return an empty match if not found.
2. Call the position of ( "left" and the position of ) "right". Set depth = 0 and lastGood = right.
3. Increment right.
4. If right goes off the end of the string, return [left ... lastGood].
5. If *right is not ( or ), return [left ... lastGood].
6. If *right == (, increment depth.
7. If *right == ), decrement depth.
8. If depth == 0, lastGood = right.
9. If depth < 0:
9a. Decrement left.
9b. If left < 0, return [0 ... lastGood]
9c. If *left == (, increment depth, lastGood = right.
9d. Else, return [left+1 ... lastGood].
10. Go to 3.

No recursion, no arbitrary backtracking, not even a stack! No character in the input is evaluated more than three times.

/s/ Adam