lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 04/20/2017 05:48 PM, Gé Weijers wrote:
> On Mon, Apr 10, 2017 at 13:38 Martin <eden_martin_fuhrspam@gmx.de
> <mailto:eden_martin_fuhrspam@gmx.de>> wrote:
> 
>     On 04/05/2017 06:14 PM, Gé Weijers wrote:
>     > I don't yet see how to do this in linear time by bolting a recognizer
>     > for the "%b()" pattern onto a finite state automaton.
> 
>     I think this is impossible for canonic finite state automata (which is
>     just a series of triples (state, cur_char, next_state) from limited
>     alphabet).
> 
>     But is possible if we add some "condition" and "action" fields
>     that contain functions and procedures:
> 
>     FSA (finite state automata):
> 
>      state | cur_char |   cond   |            action          | next_state
>     -------+----------+----------+----------------------------+-----------
>      SCAN  |   EOL    |          |                                 | DONE
>      SCAN  |    (     |          | starts[++deep] = POS            | SCAN
>      SCAN  |    )     | deep > 0 | capt = seg(starts[deep--], POS) | SCAN
>      SCAN  |    )     |          |                                 | SCAN
>      SCAN  |   ANY    |          |                                 | SCAN
> 
> 
> A regular expression like ".*%b()" will match "(((((((())", which your
> automaton will not unless it's a nondeterministic one. 
> You cannot know how many '('s are matched by ".*" until you get to the
> end of the string.
> 
> I don't think there's a linear time algorithm to match expressions like
> this in the general case. You either backtrack, or your deterministic
> state representation is a list only bounded by the input size. 
> 
> -- Gė

Yes, AFAIK it's impossible to match ".*%b()" in linear time. That
message was regarding FSA to match "%b()".

This is due dreaded left recursion: in ('(((((((())'):match('.*%b()')
for any unprocessed char we don't know whether it belongs to "%b()" or
to ".*" part.

So we're diving, moving all chars to ".*" part until end of string,
where is no space for "%b()". Then, one level up we're moving last ")"
to %b part just to fail again. Finally we succeed in matching "()" as
"%b()" and return this result:

Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
> ('(((((((())'):match('.*%b()')
(((((((()

BTW there is tricky part: probably you want "(())" in %b part.
In this case pattern should be ".-%b()":

> ('(((((((())'):match('(.-)(%b())')
((((((  (())

-- Martin