• Subject: lpeg could infer the validity of this grammar.
• From: Sam Putman <atmanistan@...>
• Date: Sun, 14 Dec 2014 11:39:58 -0800

This is my first post to Lua list, so let me say hello and thank you.

Eliminating left recursion is a sometimes frustrating part of writing a PEG grammar.

One way to do this rewrite is shown below. Sadly, lpeg doesn't allow this.

```lua
require "lpeg"

match, P, V, R, S = lpeg.match, lpeg.P, lpeg.V, lpeg.R, lpeg.S

space = P" "
digit = R"09"
operand = S"+-*/"
token = space + P"(" + digit^1 + P")" + operand

tokenizer = token^1

group = P {
"groups";
groups = #token * V"group" + token ;
group = P"(" * V"groups"^0 * P")";
}
---[[
expr = P {
"expr";
expr = token + ( V"group" + V"factor" + V"term")
group = P"(" * V"expr"^0 * P")";
factor = #token * V"expr" * (P"*" + P"/") * V"expr";
term = token; -- we do not reach this point.
}
--]] -- The following asserts are valid if expr is commented:

test_s = "(1 + 2 / 3) + 4 * 5 + (6)"
group_s = "(1 (2))"
invalid_token =  "(1 + 2 / ; 3) + 4 * 5 + (6)"
assert(match(tokenizer,test_s) == #test_s+1)
assert(match(tokenizer,invalid_token) ~= #invalid_token+1)
assert(match(group,group_s))
```

This program fails with "rule 'expr' may be left recursive".

I assure you, this is not possible. This program is somewhat of a logical proof of that.

The # predicate means that, if token is called by the rule, it will consume part of the string. Because we put it at the start of expr, it is given a *chance*to*fail* before any infinite recursion can take place.

lpeg does need to follow the call chain and ensure that token is an inevitable call. This is static inference of the grammar with no runtime cost. It is moderately complex, we must assure that it's impossible for the call to expr to succeed on empty without first calling token. PEGs are deterministic, so this is guaranteed to be a possible analysis.

This was designed so that it should succeed deterministically. The naive state machine will always check for token in expr before checking the grouped terms, and factor uses #token to assure that this first parse will succeed before expr is allowed to be called.

This grammar will never recurse, on valid or invalid input. It will behave as specified. It's badly broken as a real _expression_ parser, intended only to demonstrate the principle.

I am willing to accept a grammar that shows that it's impossible to statically infer whether #pattern propagation can or cannot be performed, as a counterargument. Otherwise, I would very much like to see lpeg support this inference.