lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Nahuel Greco wrote:
>
> This maybe is a FAQ, but there is not in the faq :)...

Perhaps not so frequently asked, but surely already asked :-)

> Why lua doesnt use the posix / perl type regex's?

There are two reasons for this:
- As stated in the copyright notice, found eg. in lua.h:
* This implementation contains no third-party code.
To implement from scratch the Posix regexes must be quite a dauting task,
and prone to errors too.

- More importantly, Lua code is simple and small. As Roberto (if I recall
correctly) stated, Lua regexes are implemented in a quite a little number of
lines. Using Posix or Perl regexes would require a much bigger
implementation.

Ah, I have found the original message (using the Yahoo Group search
facility...):
>>>
From:  Roberto Ierusalimschy <roberto@i...>
Date:  Thu Nov 16, 2000  1:12 pm
Subject:  Re: pmatch library

> Is there a good reason why this couldn't be added to standard Lua
patterns?

I don't know how to do it in a *simple* way. Notice that, unlike most other
pattern-matching implementations, Lua does not use FSA; on the other hand,
the whole implementation has less than 400 lines (versus > 2000 for typical
FSA implementations). Also, the pattern-matching in Lua has some features
that can't be done (I guess) with FSA (such as %b and .-). So, I still
think the best solution is to keep Lua with a standard small implementation
for pattern-matching, and to add a new library with a full
regular-expression implementation. Unless, of course, someone finds a good
way to implement alternation into the current (recursive) style.

-- Roberto
<<<

Reuben Thomas has made a Posix regex library:
>>>
From:  Reuben Thomas <rrt1001@c...>
Date:  Sat Nov 18, 2000  4:53 am
Subject:  Regexp library


I've just finished the first cut of a POSIX regexp library for Lua. It
simply exposes the POSIX regcomp and regexp functions (regfree is hidden in
a gc tag method). The functions provided are:

regex(p): returns the compiled regex (a userdata) corresponding to the
pattern p (a string)

match(t, r): returns a triplet of start position, end position, table of
substring matches for the text string t and compiled regex r.

It seems slightly more useful to have regex than not, because a regex can
then be used repeatedly (it wouldn't be awfully inefficient to reimplement
gsub with it).

One last feature, untested as yet: if you have Henry Spencer's regex
library, then NULs are allowed in the pattern and text.

If anyone would like the source for this library (about 3Kb) please email
me; I'll make it available on my Lua page soon, along with the bitwise
operations library I posted to the list earlier.

--
http://sc3d.org/rrt/ | impatience, n.  the urge to do nothing
<<<

>>>
From:  Reuben Thomas <rrt1001@c...>
Date:  Sat Feb 10, 2001  12:20 am
Subject:  Idea for supporting better regexps


I have an idea for pattern matching:

After I wrote my regex library for Lua, it occurred to me to replace strfind
and gsub with versions using POSIX regexps. This seemed to uphold the
principle of not doing in Lua what can be done perfectly well outside it (I
think Henry Spencer's regex package is (or could easily be made) pure ANSI
C, so there's no reliance on non-ANSI stuff). On the other hand, the Lua
implementation of regexs is very small.

I think I might still do this for the Luas that I use, but then I had
another idea: my regex library can already be used with PCRE if you prefer
Perl regex syntax, because PCRE supports the POSIX calling API. It then
struck me that the Lua regex package could be made to support the POSIX API
as well. regcomp would be a function that simply returned the pattern as a
string, and regexec would be a call to the matching function.

So if the Lua string matching API were recast like this, then:

1. You could use the existing way of working with no change.

2. You could plug in your favourite POSIX-compatible regex library without
altering any code (just replacing one file with another in the Lua source).

This seems to increase flexibility without hurting anyone. The only problem
I can think of is that you might want to be able to use Lua and POSIX
matching in the same Lua system. This can be handled as follows: have a
#defined symbol that determines whether gsub and strfind are defined in
terms of the Lua pattern-matching functions, or those of the supplied regex
library.

This gives you three configuration options:

1. As now.

2. Use your favourite regex library to provide the "regex" and "match"
functions, while leaving strfind and gsub working as before.

3. Use your preferred regex library to provide regex and match, and
reimplement strfind and gsub in terms of them.

To reassure those interested in backwards compatibility: with option 1,
there is no change from the current state. With option 2, current programs
work as at present (unless they rely on "match" or "regex" being undefined);
new programs can take advantage of better regexes. Option 3 gives the best
solution for scripts that want to take advantage of rich regexs.

The changes needed are mostly in the build system, plus a little tweaking of
the Lua regex code (to support option 2), and will have no impact on
efficiency.

--
http://sc3d.org/rrt/ | egrep, n.  a bird that debugs bison
<<<

Regards.

--._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.--
Philippe Lhoste (Paris -- France)
Professional programmer and amateur artist
http://jove.prohosting.com/~philho/
--´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`·._.·´¯`--