[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: LPEG-based relaxed parsing again
- From: Paul K <paul@...>
- Date: Thu, 04 Sep 2014 05:05:13 +0000
Hi All,
I've made some progress with relaxed parsing of Lua grammar (thanks to
all who helped with my earlier questions), but have stumbled on an
issue I can't find a solution for. I'm sure it's caused by my limited
understanding of LPEG processing, so would be interested in any
advice.
Here is the setup. I have a grammar that allows zero or more
statements of various types, but I also want to accept and ignore
anything that doesn't match any of those types. Using the grammar
(below): "do end", "do (1) end", "do (1)(2) end" are all valid
examples and "do (1)a(2) end" is not, but I want it to be processed in
the same way as "do (1)(2) end" (with "a" ignored).
What I tried to to is to use lpeg.V("Stat")^0 + lpeg.C(lpeg.P(1)), but
this doesn't allow "a" to be captured and the processing continued; I
also tried to do (lpeg.V("Stat") + lpeg.C(lpeg.P(1)))^0, however this
doesn't work either as it captures valid fragments before ^0
backtracking.
The question is: how do I write the expression that take zero or more
repetitions of a pattern and (separately) captures all non-matching
strings?
Here is my simplified example. It almost works, but it in addition to
capturing "a" as unknown (which is what I want), it also captures
"end" as unknown, which is what I don't want:
local lpeg = require 'lpeg'
local function recover(p, err)
return p + lpeg.Cmt(lpeg.Cc(err),
function(s, p, ...) print("recover", ...) return true end)
end
local function unknown(p)
return p + lpeg.Cmt(lpeg.C(lpeg.P(1)),
function(s, p, ...) print("unknown", ...) return true end)
end
local function capture(pos, ...)
print("capture", pos, ...)
return { pos = pos, ... }
end
local function token(p) return p * lpeg.S(" ")^0 end
local chunk = lpeg.P { "Chunk";
Chunk = lpeg.V("Block") * -1 + error;
Block = unknown(lpeg.V("Stat"))^0; --<-- "unknown" handling
DoStat = lpeg.Cp() * token(lpeg.P"do") * lpeg.V("Block") *
recover(token(lpeg.P"end"), "end") / capture;
ExprStat = lpeg.Cp() * token(lpeg.P"(") * token(lpeg.R("09")^0) *
recover(token(lpeg.P")"), ")") / capture;
Stat = lpeg.V("ExprStat") + lpeg.V("DoStat");
}
print("matches", lpeg.match(chunk, "do (1)(2) end"))
print("matches with unknown", lpeg.match(chunk, "do (1)a(2) end"))
Thank you.
Paul.