• Subject: Re: Safe user-facing pattern matching.
• From: Coda Highland <chighland@...>
• Date: Mon, 24 Apr 2017 13:18:18 -0700

```On Mon, Apr 24, 2017 at 12:12 PM, Gé Weijers <ge@weijers.org> wrote:
>
>
> On Sun, Apr 23, 2017 at 9:06 AM, Martin <eden_martin_fuhrspam@gmx.de> wrote:
>>
>> >> Yes, AFAIK it's impossible to match ".*%b()" in linear time. That
>> >> message was regarding FSA to match "%b()".
>>
>> So I was wrong. It's possible to match ".*%b()" in linear time.
>> It is almost equivalent of matching "()". Thank you for showing it!
>>
>
> You can match this particular expression in linear time going from right to
> left, because the language of the reverse string ("^%b().*\$") is a LR(1)
> language.
>
> Here's another problem:
>
>   "^a*(%bab)(%bbc)c*\$"
>
> This won't match if the number of b's is larger than the sum of the numbers
> of a's and c's. And "%bab" matches "a-+abb", for instance, whereas "a*"
> won't match any letters except 'a'.
>
> So of course there's an ad-hoc linear time algorithm to match this, but the
> point is: can we come up with a matching algorithm that writes that ad-hoc
> algorithm automatically, i.e. can we solve the general case?

That's an interesting challenge. Obviously the answer is "no" for the
most general case of all context-free grammars. If you refine the
question to "can we find a linear-time algorithm for an arbitrary CFG
if one exists" then I'm not sure; my intuition says it's still
impossible but we can asymptotically reduce the number of grammars
that have linear-time algorithms that we can't find.

If the question is restricted SOLELY to "can we find a linear-time
algorithm for regular languages + balanced delimiters"... I don't know
at all.

But if you go waaaaaay upthread, the actual point is: How can you
constrain the definition of string patterns in a way that's still
concise to write, allows you to match interesting things, and operates
in guaranteed linear time?

Constraining things to JUST pure regular languages satisfies the first
and third requirements, but there are obvious things that you would
want to handle that you can't.