lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On Apr 28, 2017, at 23:44, Martin <eden_martin_fuhrspam@gmx.de> wrote:

On 04/24/2017 12:12 PM, Gé Weijers wrote:
Here's another problem:

 "^a*(%bab)(%bbc)c*$"

This won't match if the number of b's is larger than the sum of the
numbers of a's and c's. And "%bab" matches "a-+abb", for instance,
whereas "a*" won't match any letters except 'a'.

I didn't catch problem, could you provide sample strings that matched
and not matched by pattern?


“aaaaaaaaaaaaaabbbbccccccccccccc” will match
“abbbbbbc” won’t match



I think the more common questions besides this are

 * does current regexps language is powerful enough to match all
   possible strings that can be matched in linear time?
   (probably not)

No, definitely not. LL and LR parsers work in linear time, and they recognize languages inexpressible by any of the ‘extended regular _expression_’ systems I’ve seen (PCRE etc.). 

 * does such language exists?

It would be hard to formally describe the exact set of languages whose membership problem can be decided in time proportional to the size of the string. There are some oddballs in there, like strings that represent arithmetic expressions that evaluate to 42, and that does not have intermediate values outside the range [-1000, 1000]. Writing a linear time program to match that is not that hard, but coming up with a formalism that included this one but not anything that can’t be matched in linear time is probably impossible.