[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: lpeg.Cg, lpeg.Cb, and how to visualize what they do
- From: Sean Conner <sean@...>
- Date: Wed, 16 Aug 2023 11:20:22 -0400
It was thus said that the Great Eduardo Ochs once stated:
>
> Here is a case that I find very strange... no, actually a case that is
> simple to understand followed by one that I find very strange. Compare:
>
> > require "lpeg"
> > = ((lpeg.C(1):Cg"c" * lpeg.C(1):Cg"d") * lpeg.Cb"c"):match"ab"
> a
> > = (lpeg.C(1):Cg"c" * (lpeg.C(1):Cg"d" * lpeg.Cb"c")):match"ab"
> a
I don't see these as different, due to the fact that
(a * b) * c = a * (b * c)
They may generate different code, but they return the same result.
Rewriting your example to how I'd write it (sans the leading "lpeg." which
assume exists, or the functions are declared local):
((Cg(C(1),"c") * Cg(C(1),"d")) * Cb"c" ):match "ab"
( Cg(C(1),"c") * (Cg(C(1),"d") * Cb"c")):match "ab"
This maches one character and places the result in a group capture named
'c', then matches one character and places that in a group capture named
'd', then you return group capture 'c'.
> = (lpeg.C(1):Cg"c" * (lpeg.C(1):Cg"d" * lpeg.Cb"x")):match"ab"
This one is different:
(Cg(C(1),"c") * (Cg(C(1),"d") * Cb"x")):match "ab"
This one, you match a character into group 'c', another character into group
'd', then try to reference group 'x', which doesn't exist.
> I draw them as this:
>
> a b a b
> \---/ \---/ \---/ \---/ \---/ \---/
> "a" "b" ["c"] "a" "b" ["c"]
> \---/ \---/ \---/ \---/
> c="a" d="b" c="a" d="b"
> \---------/ \---------/
> c="a" d="b" d="b" ["c"]
> \---------------/ \---------/
> c="a" d="b" ["c"] not found?
> \---------------/
> c="a" d="b" "a"
>
> Each ["c"] means "fetch the value associated to the key "c" and append
> it to the current Ltable", and the lower underbrace in the first
> diagram shows the moment in which that fetch happens and the ["c"] is
> replaced by "a". The second diagram shows what I _expected_ that would
> happen in the second match; I expected that in this subpattern
>
> (lpeg.C(1):Cg"d" * lpeg.Cb"c")
>
> the lpeg.Cb"c" would look only at the "Cg"s that happen inside that
> subpattern, and I would get an error like this one...
These are expressions ... what's a "subpattern"? Again,
(a * b) * c = a * (b * c)
> Anyway, I hope that these diagrams would make enough sense to the
> people who can help me fix them, and who can help me fix my mental
> model...
For me, they cloud the issue. To me, a capture captures the text of a
pattern, and possibly transforms it. For example:
-- charcater that isn't a " or \
local unescaped = R(" !","#[","]~","\128\255")
local char = unescaped
+ P[[\"]] / [["]] -- transform \" to "
+ P[[\\]] / [[\]] -- transform \\ to \
+ P[[\b]] / "\b" -- and so on
+ P[[\f]] / "\f" -- (these are techincally captures)
+ P[[\n]] / "\n"
+ P[[\r]] / "\r"
+ P[[\t]] / "\t"
+ P[[\/]] / "/"
+ P[[\u]] -- convert \uXXXX to UTF-8 character
* (
R("09","AF","af")
* R("09","AF","af")
* R("09","AF","af")
* R("09","AF","af")
)
/ function(c)
return utf8.char(tonumber(c:sub(3,-1),16))
end
-- collect a string, convert escaped characters as needed
local string = P'"' * Cs(char^0) * P'"'
print(string:match [["hello\two\u0072ld\n"]])
-spc