[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LPEG 're' module self-test fails
- From: Nick Gammon <nick@...>
- Date: Mon, 20 Jun 2016 09:22:28 +1000
Further (again) to my message about the re module self-test failure, I think I worked it out (this took a few days).
It fails to parse any lines with "<-" on them, leading me to query why the test for "name S !arrow" failed.
The relevant part of the grammar is here:
pattern <- exp !.
exp <- S (alternative / grammar)
alternative <- seq ('/' S seq)*
seq <- prefix*
...
grammar <- definition+
An "exp" is either an "alternative" or a "grammar".
Assuming the alternative doesn't use the "/" symbol we effectively have this:
pattern <- S (prefix* / definition+) !.
-----
We can make up a similar test case:
require "re"
local target = "foo"
local grammar = " ('foo'* / 'bar'+) !."
print (re.match (target, grammar))
That will match at target of "foo" but not "bar". Why? Because even zero instances of "foo" are acceptable as a match. Therefore the "'bar'+" alternative is not considered. Thus in the real grammar "alternative" can consist of an empty string. A line like this will still match "alternative" (without consuming any characters):
pattern <- exp !.
Now the final test fails (the test that we are at end-of-subject).
However by putting "grammar" first (ie. "(grammar / alternative)" rather than "(alternative / grammar)" ) this works, because "grammar" matches ONE or more (not zero or more) and will fail on a non-grammar line, thus letting the PEG try the "alternative" route.
You could work around it as well by insisting that "seq" matches at least something:
seq <- prefix+
However that fails to pass a totally empty grammar.
Reference: http://www.inf.puc-rio.br/~roberto/lpeg/re.html