lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

I can appreciate how difficult it is to make a small language with flexible
operator extensions. Its too bad, things like lpeg could benefit from it.

The PEG operators (*, +, /, etc.) are easy and mnemonic:

	’’     Literal string
	""     Literal string
	[]     Character class
	.      Any character
	(e)    Grouping
	e?     Optional
	e*     Zero-or-more
	e+     One-or-more
	e1 e2  Sequence
	e1/e2  Prioritized Choice
	&e     And-predicate
	!e     Not-predicate

Anybody who has used a regex, or one of dozens of EBNF variants, can
remember this easily.

With lpeg, we have:

  Operator  Description
  lpeg.P(string)     Matches string literally
  lpeg.P(number)     Matches exactly number characters
  lpeg.S(string)     Matches any character in string (set)
  lpeg.R("xy")       Matches any character between x and y (range)
  patt^n             Matches at least n repetitions of patt
  patt^-n            Matches at most n repetitions of patt
  patt1 * patt2      Matches patt1 followed by patt2
  patt1 + patt2      Matches patt1 or patt2 (ordered choice)
  patt1 - patt2      Matches patt1 if patt2 does not match
  -patt              Equivalent to "" - patt
  patt1 / ...        Used to capture matches? Why not have the same meaning as PEG?

There isn't any commonality here, I find it quite anti-mnemonic (all the
operators are used for different purposes than in the original PEG grammars). I
can't read a PEG without the table above taped to my monitor.

With boost::sprit (which looks pretty similar to PEGs, though their
might be theoretic differences in capability), you can use C++'s much
more flexible operator overloading to get:

	!P        Matches P or an empty string
	*P        Matches P zero or more times
	+P        Matches P one or more times
	~P        Matches anything that does not match P
	P1 | P2   Matches P1 or P2
	P1 - P2   Matches P1 but not P2
	P1 >> P2  Matches P1 followed by P2
	P1 % P2   Matches one or more P1 separated by P2
	P1 & P2   Matches both P1 and P2
	P1 ^ P2   Matches P1 or P2, but not both
	P1 && P2  Synonym for P1 >> P2
	P1 || P2  Matches P1 | P2 | P1 >> P2

It starts off well, the unary operators are pretty familiar, as is |.
After that it gets successively worse as various boolean and mathematical
operators are stolen for things with no particular relation to their common

I have mixed feelings about (ab)using operator overloading to support inline
expression of grammars. I can see the appeal to somehow add grammars as
elements of the language, rather than strings, but strings aren't so hard to
use, and are fairly flexible. I wonder if it wouldn't be better to use lpeg to
write something like:

equalcount = lpeg.grammar[[
	S = "0" B
	  / "1" A
	  / ""
	A = "0" S
	  / "1" A A
	B = "1" S
	  / "0" B B

instead of:

	local S, A, B = 1, 2, 3
	equalcount = lpeg.P{
	  [S] = "0" * lpeg.V(B) + "1" * lpeg.V(A) + "",
	  [A] = "0" * lpeg.V(S) + "1" * lpeg.V(A) * lpeg.V(A),
	  [B] = "1" * lpeg.V(S) + "0" * lpeg.V(B) * lpeg.V(B),
	} * -1


pascalcomment = lpeg.grammar[[
	C = "(*" N* "*)"
	N = C
	  / !"(*" .

instead of