lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, Sep 04, 2014 at 05:05:13AM +0000, Paul K wrote:
> Hi All,
> 
> I've made some progress with relaxed parsing of Lua grammar (thanks to
> all who helped with my earlier questions), but have stumbled on an
> issue I can't find a solution for. I'm sure it's caused by my limited
> understanding of LPEG processing, so would be interested in any
> advice.
> 
> Here is the setup. I have a grammar that allows zero or more
> statements of various types, but I also want to accept and ignore
> anything that doesn't match any of those types. Using the grammar
> (below): "do end", "do (1) end", "do (1)(2) end" are all valid
> examples and "do (1)a(2) end" is not, but I want it to be processed in
> the same way as "do (1)(2) end" (with "a" ignored).
> 
> What I tried to to is to use lpeg.V("Stat")^0 + lpeg.C(lpeg.P(1)), but
> this doesn't allow "a" to be captured and the processing continued; I
> also tried to do (lpeg.V("Stat") + lpeg.C(lpeg.P(1)))^0, however this
> doesn't work either as it captures valid fragments before ^0
> backtracking.
> 
> The question is: how do I write the expression that take zero or more
> repetitions of a pattern and (separately) captures all non-matching
> strings?

I'm not sure if it's possible to write a PEG which parses completely random
segments inserted into the text. You're describing a grammar, and the input
source must consistently obey certain logical rules. You may need to simply
hand-roll a parser the old fashioned way which can directly implement your
heuristic for consuming garbage.

That said, when I wrote my PEG for parsing Lua, one of the key things I
learned regarding PEGs is the not-predicate (aka negative lookahead
assertion), which is the "-" operator in LPeg. The not-predicate allows you
to describe a universe of inputs defined by what _doesn't_ match. IIRC, the
PEG paper explains the importance of predicates (both positive and
negative), which are just as crucial as ordered choice in dealing with
certain kinds of ambiguity such as left recursion.

I've seen Lua syntax parsers written in LPeg which didn't understand how to
use the not predicate, and so did funky things trying to distinguish
prefixexp ambiguities in the Lua BNF syntax description. In fact, I rarely
see predicates used in PEGs, despite the fact that they're very powerful and
can be used to describe grammars more concisely even when they're not
strictly necessary.

I'm not sure exactly what you're trying to accomplish, but I think
predicates could possibly help, or at least get you further along. I really
doubt the following would work as-is, but the first thing that popped into
my mind was: V"Stat" + C((P(1)^0) - V"Stat")