lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I have an idea for pattern matching:

After I wrote my regex library for Lua, it occurred to me to replace strfind
and gsub with versions using POSIX regexps. This seemed to uphold the
principle of not doing in Lua what can be done perfectly well outside it (I
think Henry Spencer's regex package is (or could easily be made) pure ANSI
C, so there's no reliance on non-ANSI stuff). On the other hand, the Lua
implementation of regexs is very small.

I think I might still do this for the Luas that I use, but then I had
another idea: my regex library can already be used with PCRE if you prefer
Perl regex syntax, because PCRE supports the POSIX calling API. It then
struck me that the Lua regex package could be made to support the POSIX API
as well. regcomp would be a function that simply returned the pattern as a
string, and regexec would be a call to the matching function.

So if the Lua string matching API were recast like this, then:

1. You could use the existing way of working with no change.

2. You could plug in your favourite POSIX-compatible regex library without
altering any code (just replacing one file with another in the Lua source).

This seems to increase flexibility without hurting anyone. The only problem
I can think of is that you might want to be able to use Lua and POSIX
matching in the same Lua system. This can be handled as follows: have a
#defined symbol that determines whether gsub and strfind are defined in
terms of the Lua pattern-matching functions, or those of the supplied regex
library.

This gives you three configuration options:

1. As now.

2. Use your favourite regex library to provide the "regex" and "match"
functions, while leaving strfind and gsub working as before.

3. Use your preferred regex library to provide regex and match, and
reimplement strfind and gsub in terms of them.

To reassure those interested in backwards compatibility: with option 1,
there is no change from the current state. With option 2, current programs
work as at present (unless they rely on "match" or "regex" being undefined);
new programs can take advantage of better regexes. Option 3 gives the best
solution for scripts that want to take advantage of rich regexs.

The changes needed are mostly in the build system, plus a little tweaking of
the Lua regex code (to support option 2), and will have no impact on
efficiency.

-- 
http://sc3d.org/rrt/ | egrep, n.  a bird that debugs bison