lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Sun, Oct 22, 2017 at 11:36 PM Sean Conner <sean@conman.org> wrote:

<snip>

Interesting analysis. One issue (some lines snipped)

  Pattern items:
                x*      lpeg.P"x"^0
                x+      lpeg.P"x"^1
                x?      lpeg.P"x"^-1

The issue here is that all of these Lua pattern tokens are greedy, but not possessive; i.e. they will match the longest possible string, but then if the match subsequently fails, they will backtrack one character and try again, until either the match succeeds or they cannot backtrack any further because there are no more characters for it to give up.

The corresponding LPeg pattern, on the other hand, is possessive, meaning that it will match as much as possible, but never backtrack. So if the pattern fails to match after that point, it will simply return failure, even if backtracking could have produced a successful match. This lack of backtracking is precisely why I raised the concern of LPeg not being able to do everything that standard patterns can do.

The questions then are:
1. Can this limitation be worked around to create equivalent behavior?
2. If it can be worked around, how easy or complicated is it to do so?

(Come to think about it, my original example pattern shouldn't have used the "-" token; rather, I should have written something else that used * or + and required backtracking in order to produce a successful match. My brain is a bit fried after a long day today.)
 
                %n      patt / "%n"
                %b()    Yes, see below ...
                %f[set] Erm ... I think (P(1) - set)^0

  I would have to play around with the %f[set] pattern, not being terribly
familiar with it, but I think the LPeg I have for it is correct if I
understand the documentation.

I don't think so. From my reading of the LPeg docs, this would actually advance the match position, which %f[set] does not (it matches the empty string). Also, it seems to fail to test the previous character is not in the set, only testing the next character, and I'm not sure even that test is correct.

Possibly this could be done with a custom function through lpeg.Cmt? (In fact, I think that might be the only way, since you have to look backwards at the previous character, which LPeg seems to be bad at doing on its own.)
 
  And I content that the only reason LPeg looks difficult is that you are
not familiar with it.  I personally find it difficult to read Lua patterns
(and even regexs that tend to look like line noise to me) but that's because
I rarely use them, instead using LPeg.

Quite possible. I learned full regular expressions before I came to Lua, so I found Lua patterns and their regex-like syntax easy to understand. LPeg is so radically different from those that it's hard for me to make the huge adjustment necessary to grasp it.