> > Okay, then this should do it:
> > lpeg.B(lpeg.P(1) - set) * #set + #set
> > About lpeg.B():
> > Returns a pattern that matches only if the input string at the
> > current position is preceded by patt. Pattern patt must match only
> > strings with some fixed length, and it cannot contain captures.
> > Like the and predicate, this pattern never consumes any input,
> > independently of success or failure.
> > The bit with lpeg.B() does the check before the current postion, and that
> > the next character is in the set without consuming the input. The "+ #set"
> > is for the case when we're at the start of the input and there is no input
> > prior to the frontier.
> Hmm. I had completely missed the B function. This looks like it would work
> in the middle of the string. However, frontier patterns have interesting
> behaviors at either end of the subject string. Specifically, they will
> treat the string as if it is both prefixed and suffixed by a "\0". I'm not
> sure how this would be implemented. I think you would have to read the set
> to decide if "\0" is in the set or not, then write a P(-1) or P(1) as
> necessary as the set when a frontier pattern is at the beginning or end of
> a Lua pattern. However, this doesn't handle the case of a frontier pattern
> in the middle of a Lua string, preceded or followed only by repetition that
> can (but doesn't always) match nothing. This edge case seems to be the only
> thing left for LPeg to fully duplicate Lua patterns.
I was able to replicate the following with LPeg:
string.gsub ("THE (QUICK) brOWN FOx JUMPS", "%f[%a]%u+%f[%A]", print)
) but it's not pretty (I'm
implementing a straight translation here):
local lpeg = require "lpeg"
local R = lpeg.R
local P = lpeg.P
local Cs = lpeg.Cs
local upper = R"AZ"
local alpha = R("AZ","az")
local non_alpha = P(1) - alpha
local parse = Cs(
lpeg.B(non_alpha) -- [a]
+ P(function(_,p) if p == 1 then return p end end)
* (#non_alpha + P(-1)) -- [b]
/ function(c) print(c) return c end
local test = "THE (QUICK) brOWN FOx JUMPS"
local luapatt = "%f[%a]%u+%f[%A]"
[a] Here we're checking the previous character is NOT an alphabetic
character (A-Z a-z). If this fails, then either the previous
character WAS an alphabetic character, or we're at the start of the
string. The second bit (P(function() ...) checks to see if we're at
the start of the input.
Taken together, this ensures we're at the start of a string or the
previous character wasn't an alphabetic character. This will match
"THE" and not "brOWN".
[b] This makes sure the following character is NOT an alphabetic
character, or we're at the end of the input. This matches "THE",
"QUICK" and "JUMPS" but excludes "FOx".
I will say that the frontier pattern is not something I've needed, either
with Lua patterns or with LPeg. I can see it being useful for some parsing
tasks (like pulling out upper case words in the input) but so far, I've been
able to skip this type of pattern entirely.
But again, it *is* possible with LPeg.