lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

You'd rather something similar to this syntax? (don't know how
implementable it is and didn't much care about precedence, just played
with syntax...)?
Don't know if there are reusable PEG grammars...

L''			''     Literal string
L""			""     Literal string
R/S''			[]     Character class
any			.      Any character
(e)			(e)    Grouping
opt(e) or e^opt	e?     Optional
e^0			e*     Zero-or-more
e^1			e+     One-or-more
e1 + e2		e1 e2  Sequence
e1 / e2		e1/e2  Prioritized Choice
pand(e)		&e     And-predicate
pnot(e)		!e     Not-predicate

mt  = {}
E   = {}
function opt(e) return E:new{expr=e.expr.."? "} end
function L(literal) return E:new{expr="'"..literal.."'"} end
function R(range_expression) return
E:new{expr="["..range_expression.."]"} end
function pand(e) return E:new{expr="and("..e.expr..")"} end
function pnot(e) return E:new{expr="not("..e.expr..")"} end
function mt.__pow(e, i) if type(i) == "function" then return i(e) 
                        elseif i == 0 then return
                        elseif i == 1 then return
                        return assert(false) end
function mt.__add(e1, e2) return E:new{expr=e1.expr.." "..e2.expr} end
function mt.__div(e1, e2) return
E:new{expr="("..e1.expr..")/("..e2.expr..")"} end
function E:new(o)
  o = o or {}
  o.expr = o.expr or ""
  setmetatable(o, mt)
  return o
any = E:new{}

g = (L'c'^0 + pnot(L'a')) / (pand(L('b')^opt + R'ab'^1) + opt(R'cd'))

-----Original Message-----
[] On Behalf Of Sam Roberts
Sent: Wednesday, April 11, 2007 7:26 PM
Subject: Re: Overloading and extending operators, (l)PEGs and grammars

I can appreciate how difficult it is to make a small language with
operator extensions. Its too bad, things like lpeg could benefit from

The PEG operators (*, +, /, etc.) are easy and mnemonic:

	''     Literal string
	""     Literal string
	[]     Character class
	.      Any character
	(e)    Grouping
	e?     Optional
	e*     Zero-or-more
	e+     One-or-more
	e1 e2  Sequence
	e1/e2  Prioritized Choice
	&e     And-predicate
	!e     Not-predicate

Anybody who has used a regex, or one of dozens of EBNF variants, can
remember this easily.

With lpeg, we have:

  Operator  Description
  lpeg.P(string)     Matches string literally
  lpeg.P(number)     Matches exactly number characters
  lpeg.S(string)     Matches any character in string (set)
  lpeg.R("xy")       Matches any character between x and y (range)
  patt^n             Matches at least n repetitions of patt
  patt^-n            Matches at most n repetitions of patt
  patt1 * patt2      Matches patt1 followed by patt2
  patt1 + patt2      Matches patt1 or patt2 (ordered choice)
  patt1 - patt2      Matches patt1 if patt2 does not match
  -patt              Equivalent to "" - patt
  patt1 / ...        Used to capture matches? Why not have the same
meaning as PEG?

There isn't any commonality here, I find it quite anti-mnemonic (all the
operators are used for different purposes than in the original PEG
grammars). I
can't read a PEG without the table above taped to my monitor.

With boost::sprit (which looks pretty similar to PEGs, though their
might be theoretic differences in capability), you can use C++'s much
more flexible operator overloading to get:

	!P        Matches P or an empty string
	*P        Matches P zero or more times
	+P        Matches P one or more times
	~P        Matches anything that does not match P
	P1 | P2   Matches P1 or P2
	P1 - P2   Matches P1 but not P2
	P1 >> P2  Matches P1 followed by P2
	P1 % P2   Matches one or more P1 separated by P2
	P1 & P2   Matches both P1 and P2
	P1 ^ P2   Matches P1 or P2, but not both
	P1 && P2  Synonym for P1 >> P2
	P1 || P2  Matches P1 | P2 | P1 >> P2

It starts off well, the unary operators are pretty familiar, as is |.
After that it gets successively worse as various boolean and
operators are stolen for things with no particular relation to their

I have mixed feelings about (ab)using operator overloading to support
expression of grammars. I can see the appeal to somehow add grammars as
elements of the language, rather than strings, but strings aren't so
hard to
use, and are fairly flexible. I wonder if it wouldn't be better to use
lpeg to
write something like:

equalcount = lpeg.grammar[[
	S = "0" B
	  / "1" A
	  / ""
	A = "0" S
	  / "1" A A
	B = "1" S
	  / "0" B B

instead of:

	local S, A, B = 1, 2, 3
	equalcount = lpeg.P{
	  [S] = "0" * lpeg.V(B) + "1" * lpeg.V(A) + "",
	  [A] = "0" * lpeg.V(S) + "1" * lpeg.V(A) * lpeg.V(A),
	  [B] = "1" * lpeg.V(S) + "0" * lpeg.V(B) * lpeg.V(B),
	} * -1


pascalcomment = lpeg.grammar[[
	C = "(*" N* "*)"
	N = C
	  / !"(*" .

instead of