lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


> > See https://swtch.com/~rsc/regexp/regexp1.html for implementation in less than
> > 400 lines of C (probably less rewritten in a "modern" style).
> 
> That's some really nice code, but it's missing *at least* one
> essential feature that people expect
> from a regex engine: captures.
> 
> The author calls captures "submatch extraction" and claims that
> "Thompson-style algorithms
> can be adapted to track submatch boundaries without giving up
> efficient performance". If true,
> it would be nice to see how simple and small a fast implementation can be.

It also misses character classes. Although it shouldn't be difficult
to add them, I don't think the proposed implementation would handle
them nicely. Translating '[a-z]' to 'a|b|c|d|...|z' and then running
that non-deterministically would involve a lot of states. It also
doesn't show the 're2post' function, doesn't handle errors (e.g.,
out of memory), etc.

And, of course, it cannot implement back-references, as they are
NP-complete.

-- Roberto