lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Shmuel Zeigerman said:
> <Quote from Lua 5.1 alpha manual>
>      Changes in the Libraries
> Function string.find does not return its captures.
> Use string.match for that.
> <End Quote>
>
> It seems to me that it is a step backwards.
>
> Before this change, it was possible to traverse texts using
> string.find only and obtain everything: whole matches,
> captures and the pieces lying between the matches.
>
> To get this functionality with Lua 5.1, one should use
> 2 functions instead: string.find and string.match.

I completely agree with this. The string.match interface is convenient,
and will replace the use of string.find in many cases, but it cannot
replace all uses.

Furthermore, the change to functionality in string.find will require a
non-trivial rewriting of practically every script I have in my project (it
is not simply the replacement of one function with another, because many
uses of string.find deliberately retrieve both the extent and the capture
in one call); this seems inappropriate for a point release, and in any
case unnecessary.

Please do add string.match, which is much more convenient for the case
where one is simply matching a pattern with a string, avoiding the
necessity for the "local _, _, ..." idiom, but retain the traditional
semantics of string.find for the cases where the string is being traversed
for multiple instances of a pattern. (Yes, string.gfind -- or
string.gmatch -- is helpful, but it is not universal either.)

In case it's helpful, consider the use case of the lexer library I have
been using for several years, available as part of LuaParse. The lexer
takes a table of lexical patterns and functions; matches all patterns
against the current input position in the target string, and selects the
pattern with the longest match; it then calls the appropriate function
with the captures from this match. This could be done with the "new"
string.find/string.match, but only by performing a redundant match to
extract the captures. The current implementation, though not rapid,
benefits from the fact that most non-matches fail on the first character;
in general, only successful matches scan the entire lexeme.