• Subject: Re: Argh! Regular expressions in Lua
• From: roberto@... (Roberto Ierusalimschy)
• Date: Wed, 22 Nov 2006 18:21:17 -0200

```> I applaud the principle. I am trying to understand what it is in
> the general algebra that causes the size and the complexity. Is it
> a question of bounding the number of times that the string has
> to be rescanned?

As soon as we have captures in patterns, the original theory of
regular expressions helps very little. The main idea behind regular
expressions is that a regular language does not depend on how it is
described. So you can do all kinds of manipulations on its description
and eventually arrive at a unique minimal deterministic finite automata
that recognizes the language. It does not matter if you write weird
things like ".*.*.*a", because that is simplified to ".*a".  Once you
have captures, the description of the language matters, and so most of
that theory does not apply any more.

Once the theory does not help, there is no real reason for a
pattern-matching engine to implement exactly regular expressions
(concatentation, kleene closures, and unions) instead of any other
set of fancy constructors (e.g., balanced brackets, or even full
context-free grammars).

It is not very difficult to extend Lua pattern matching with a
given particular construction. The problem is that each particular
construction will help only a small audience.  In the particular
case of both "generic repetition" and alternation, they pose two
small dificulties. The first was pointed out by Gavin, how to delimit
them. The second is that they need space for backtracking information
(one-character repetition uses only a counter). That's the reason
they are not implemented in Lua.

My dream would be something in the lines of Icon: a small set of
primitives in C and the rest written in Lua, so that it would be easy to
extend the system when necessary. One of the difficulties is that it is
not enough to have a fast-enough implementation. People now demand an
implementation which is at least as fast as Perl :)

-- Roberto

```