[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Safe user-facing pattern matching.
- From: Coda Highland <chighland@...>
- Date: Thu, 30 Mar 2017 16:57:31 -0700
On Thu, Mar 30, 2017 at 12:55 PM, Jorge <email@example.com> wrote:
> On 30/03/17 13:10, Coda Highland wrote:
>> On Thu, Mar 30, 2017 at 8:38 AM, Jorge <firstname.lastname@example.org> wrote:
>>> Well, that does not make the program safe, it just kills it, no? :)
>>> I was thinking something Lua only, like pattern subset/sanitization or
>> In terms of computational theory, there's not a whole lot you can do
>> without sacrificing something significant somewhere. The biggest time
>> sink in a pattern-matching system is the need to perform backtracking
>> when an option fails.
> So, alternatively, what would be the most expressive non-backtracking
> pattern? For example, is there something more powerful than the * and +
> wildcard matching?
The problem isn't so much one of power and expressiveness. It's one of
Regular expressions deal with ambiguity using backtracking. Consider
the expression "(ab)*ac". It's simple enough, but it's impossible to
evaluate on ANY subject string without backtracking. (The expression
"a(ba)*c" on the other hand matches the same set of strings and can be
evaluated without backtracking, but it captures a different set of
On the opposite side of the conciseness spectrum, the following
pseudocode also matches the same set of strings, in strictly linear
time with no backtracking:
state = 1
for each character in subject:
if state == 1:
if character == 'a':
state = 2
else if state == 2:
if character == 'b':
-- If you wanted to, this would be a place
-- where you could emit a capture group
state = 1
else if character == 'c':
One of the local LPeg gurus could show you an example of some code
somewhere in between, but LPeg still uses backtracking. (It also
supports grammars of a higher order than regular expressions.)
There are many other syntaxes and formalisms in common use, of varying
complexities and power.
So the real question is... what do you want the user's pattern to LOOK
like? And how much CPU time are you willing to spend transforming the
user's pattern into something that the computer can work with? How
much work do you want to put into building the tools, if there isn't a
standard one available?
That said, this is a well-researched field of study. I'll point you to
http://hackingoff.com/compilers/regular-expression-to-nfa-dfa for a
place you can start looking.