[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: yet another pattern-matching library for Lua
- From: roberto@... (Roberto Ierusalimschy)
- Date: Fri, 29 Dec 2006 14:30:37 -0200
> LPeg is an excellent library, Roberto.
> I had to #ifdef out the utf8 function on a certain operating system
> (guess which one) because MSVCRT.DLL doesn't support wctype!
This code was just a test. (Actually not even a test, as I could not
find a proper locale in my system to test that code.) This is why utf8
is not in the documentation. But the main point there was to show
how we can add new patterns in C, using the machinery of IFunc and
'newpattfunc'. Ideally, the library could export 'newpattfunc' (e.g.,
through the registry) so that external libraries could define new
patterns (like this utf8).
Another obvious extension is to allow new patterns in Lua; that is,
some way to make the machine call Lua functions, both to do pattern
matching and to handle captures. My idea is something like this:
Then, 'func' is called with arguments s (the subject string), i (the
current position), plus all captures from 'patt'. Then func returns
either nil (match fails) or k (the new current position) plus an
arbitrary Lua value, which becomes the sole capture of this pattern.
(The motivation for allowing Lua functions to do captures is that the
table capture is very powerful, but it has a big limitation: it is
build only after the whole pattern matches. For large subjects, the
number of itermediate captures can be too large, overflowing the space
to keep these captures. Another option would be a "early table capture"
that would create the table as soon as its subpattern succeeds, even
if its resulting table may be discarded later, if an outter pattern
fails and the machine backtracks.)
Another open question is to how to do substitutions, using this
technique of "whole matches" instead of searching. Mike Pall made
an interesting suggestion of specifing the substitutions in the
patterns themselves; e.g. like this:
... (lpeg.P("hi") / "hello") ...
That is, this pattern matches against "hi" and changes it to "hello".
Again, we have the problem of how to handle backtrackings...