[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

• Subject: RE: Overloading and extending operators, (l)PEGs and grammars
• From: "Grellier, Thierry" <t-grellier@...>
• Date: Thu, 12 Apr 2007 20:13:38 +0200

```You'd rather something similar to this syntax? (don't know how
implementable it is and didn't much care about precedence, just played
with syntax...)?
Don't know if there are reusable PEG grammars...

L''			''     Literal string
L""			""     Literal string
R/S''			[]     Character class
any			.      Any character
(e)			(e)    Grouping
opt(e) or e^opt	e?     Optional
e^0			e*     Zero-or-more
e^1			e+     One-or-more
e1 + e2		e1 e2  Sequence
e1 / e2		e1/e2  Prioritized Choice
pand(e)		&e     And-predicate
pnot(e)		!e     Not-predicate

mt  = {}
E   = {}
function opt(e) return E:new{expr=e.expr.."? "} end
function L(literal) return E:new{expr="'"..literal.."'"} end
function R(range_expression) return
E:new{expr="["..range_expression.."]"} end
function pand(e) return E:new{expr="and("..e.expr..")"} end
function pnot(e) return E:new{expr="not("..e.expr..")"} end
function mt.__pow(e, i) if type(i) == "function" then return i(e)
elseif i == 0 then return
E:new{expr=e.expr.."*"}
elseif i == 1 then return
E:new{expr=e.expr.."+"}
end
return assert(false) end
function mt.__add(e1, e2) return E:new{expr=e1.expr.." "..e2.expr} end
function mt.__div(e1, e2) return
E:new{expr="("..e1.expr..")/("..e2.expr..")"} end
function E:new(o)
o = o or {}
o.expr = o.expr or ""
setmetatable(o, mt)
return o
end
any = E:new{}

g = (L'c'^0 + pnot(L'a')) / (pand(L('b')^opt + R'ab'^1) + opt(R'cd'))
print(g.expr)

-----Original Message-----
From: lua-bounces@bazar2.conectiva.com.br
[mailto:lua-bounces@bazar2.conectiva.com.br] On Behalf Of Sam Roberts
Sent: Wednesday, April 11, 2007 7:26 PM
To: lua@bazar2.conectiva.com.br
Subject: Re: Overloading and extending operators, (l)PEGs and grammars

I can appreciate how difficult it is to make a small language with
flexible
operator extensions. Its too bad, things like lpeg could benefit from
it.

The PEG operators (*, +, /, etc.) are easy and mnemonic:

''     Literal string
""     Literal string
[]     Character class
.      Any character
(e)    Grouping
e?     Optional
e*     Zero-or-more
e+     One-or-more
e1 e2  Sequence
e1/e2  Prioritized Choice
&e     And-predicate
!e     Not-predicate

Anybody who has used a regex, or one of dozens of EBNF variants, can
remember this easily.

With lpeg, we have:

Operator  Description
lpeg.P(string)     Matches string literally
lpeg.P(number)     Matches exactly number characters
lpeg.S(string)     Matches any character in string (set)
lpeg.R("xy")       Matches any character between x and y (range)
patt^n             Matches at least n repetitions of patt
patt^-n            Matches at most n repetitions of patt
patt1 * patt2      Matches patt1 followed by patt2
patt1 + patt2      Matches patt1 or patt2 (ordered choice)
patt1 - patt2      Matches patt1 if patt2 does not match
-patt              Equivalent to "" - patt
patt1 / ...        Used to capture matches? Why not have the same
meaning as PEG?

There isn't any commonality here, I find it quite anti-mnemonic (all the
operators are used for different purposes than in the original PEG
grammars). I
can't read a PEG without the table above taped to my monitor.

With boost::sprit (which looks pretty similar to PEGs, though their
might be theoretic differences in capability), you can use C++'s much
more flexible operator overloading to get:

Unary:
!P        Matches P or an empty string
*P        Matches P zero or more times
+P        Matches P one or more times
~P        Matches anything that does not match P
Binary:
P1 | P2   Matches P1 or P2
P1 - P2   Matches P1 but not P2
P1 >> P2  Matches P1 followed by P2
P1 % P2   Matches one or more P1 separated by P2
P1 & P2   Matches both P1 and P2
P1 ^ P2   Matches P1 or P2, but not both
P1 && P2  Synonym for P1 >> P2
P1 || P2  Matches P1 | P2 | P1 >> P2

It starts off well, the unary operators are pretty familiar, as is |.
After that it gets successively worse as various boolean and
mathematical
operators are stolen for things with no particular relation to their
common
usage.

I have mixed feelings about (ab)using operator overloading to support
inline
expression of grammars. I can see the appeal to somehow add grammars as
elements of the language, rather than strings, but strings aren't so
hard to
use, and are fairly flexible. I wonder if it wouldn't be better to use
lpeg to
write something like:

equalcount = lpeg.grammar[[
S = "0" B
/ "1" A
/ ""
A = "0" S
/ "1" A A
B = "1" S
/ "0" B B
]]

instead of:

local S, A, B = 1, 2, 3
equalcount = lpeg.P{
[S] = "0" * lpeg.V(B) + "1" * lpeg.V(A) + "",
[A] = "0" * lpeg.V(S) + "1" * lpeg.V(A) * lpeg.V(A),
[B] = "1" * lpeg.V(S) + "0" * lpeg.V(B) * lpeg.V(B),
} * -1

or

pascalcomment = lpeg.grammar[[
C = "(*" N* "*)"
N = C
/ !"(*" .
]]

instead of

...

Cheers,
Sam

```