Hi list,
I was trying to understand what exactly lpeg.Cg does when it creates
anyonymous group captures and I found something weird... well, at
least on lpeg-1.0.2, but let me ask it anyway.
This is the code; PP is my favorite pretty-printing function and the
results of the tests are after the "-->"s:
require "lpeg"
B,C,P,R,S,V = lpeg.B,lpeg.C,lpeg.P,lpeg.R,lpeg.S,lpeg.V
Cb,Cc,Cf,Cg = lpeg.Cb,lpeg.Cc,lpeg.Cf,lpeg.Cg
Cp,Cs,Ct = lpeg.Cp,lpeg.Cs,lpeg.Ct
Carg,Cmt = lpeg.Carg,lpeg.Cmt
lpeg.pm = function (pat, str) PP(pat:match(str or "")) end
char = C(1)
char2 = C(1)*C(1)
char2g = Cg(C(1)*C(1))
f = function (...) return "("..table.concat({...}, ",")..")" end
(char * char ^0):pm("abcde") --> "a" "b" "c" "d" "e"
(char * char2 ^0):pm("abcde") --> "a" "b" "c" "d" "e"
(char * char2g^0):pm("abcde") --> "a" "b" "c" "d" "e"
(char2g * char2g^0):pm("abcdef") --> "a" "b" "c" "d" "e" "f"
(char * char ^0):Cf(f):pm("abcde") --> "((((a,b),c),d),e)"
(char * char2 ^0):Cf(f):pm("abcde") --> "((((a,b),c),d),e)"
(char * char2g^0):Cf(f):pm("abcde") --> "((a,b,c),d,e)"
(char2g * char2g^0):Cf(f):pm("abcdef") --> "((a,c,d),e,f)"
(char * char ^0):Ct():pm("abcde") --> {1="a", 2="b", 3="c", 4="d", 5="e"}
(char * char2 ^0):Ct():pm("abcde") --> {1="a", 2="b", 3="c", 4="d", 5="e"}
(char * char2g^0):Ct():pm("abcde") --> {1="a", 2="b", 3="c", 4="d", 5="e"}
(char2g * char2g^0):Ct():pm("abcdef") --> {1="a", 2="b", 3="c", 4="d", 5="e", 6="f"}
((char * char ^0) / 2):pm("abcde") --> "b"
((char * char2 ^0) / 2):pm("abcde") --> "b"
((char * char2g^0) / 2):pm("abcde") --> "b"
((char2g * char2g^0) / 2):pm("abcdef") --> "b"
Some of the patterns at the left in the tests above produce
"a" "b" "c" "d" "e"
as five separate captures, some produce them as "a" plus two captures
with two values each, grouped like this,
"a" ("b" "c") ("d" "e")
and some produce three captures with two values each, grouped like
this:
("a" "b") ("c" "d") ("e" "f")
and apparently only lpeg.Cf() distinguishes all these cases... I
couldn't find anything else, besides lpeg.Cf(), that would _not_
coerce the three cases above into five or six separate captures.
Are there other ways - besides lpeg.Cf() - to access the captures
while they are still in this form,
"a" ("b" "c") ("d" "e")
or
("a" "b") ("c" "d") ("e" "f"),
before the groups are unpacked?
Thanks in advance!