• Subject: lpeg.Cg, lpeg.Cb, and how to visualize what they do
• From: Eduardo Ochs <eduardoochs@...>
• Date: Sun, 13 Aug 2023 15:50:24 -0300

```Hi list,

I'm trying to learn how group captures in lpeg "really" work, and it
seems that they store data in a certain data structure - I will refer
to it as "Ltables", but this is obviously an improvised name - that
is between tables and lists of values...

In the examples below I will use my favorite pretty-printing function,
"PP", whose output is like this:

PP(2, "3", {4, 5, a=6, [{7,8}]=9, [{7,8}]=10})
--> 2 "3" {1=4, 2=5, "a"=6, {1=7, 2=8}=9, {1=7, 2=8}=10}

If we run this in a REPL,

require "lpeg"
B,C,P,R,S,V = lpeg.B,lpeg.C,lpeg.P,lpeg.R,lpeg.S,lpeg.V
Cb,Cc,Cf,Cg = lpeg.Cb,lpeg.Cc,lpeg.Cf,lpeg.Cg
Cp,Cs,Ct    = lpeg.Cp,lpeg.Cs,lpeg.Ct
Carg,Cmt    = lpeg.Carg,lpeg.Cmt
lpeg.pm     = function (pat, str) PP(pat:match(str or "")) end

(Cc("a","b") * Cc("c","d"))                           :pm()
(Cc("a","b") * Cc("c","d"):Cg"e")                     :pm()
(Cc("a","b") * Cc("c","d"):Cg"e")                :Ct():pm()
(Cc("a","b") * Cc("c","d"):Cg"e" * Cc"f")        :Ct():pm()
(Cc("a","b") * Cc("c","d"):Cg"e" * Cc"f" * Cb"e"):Ct():pm()
(Cc("a","b") * Cc("c","d"):Cg"e" * Cc"f" * Cb"e")     :pm()

we get this output:

>   (Cc("a","b") * Cc("c","d"))                           :pm()
"a" "b" "c" "d"
>   (Cc("a","b") * Cc("c","d"):Cg"e")                     :pm()
"a" "b"
>   (Cc("a","b") * Cc("c","d"):Cg"e")                :Ct():pm()
{1="a", 2="b", "e"="c"}
>   (Cc("a","b") * Cc("c","d"):Cg"e" * Cc"f")        :Ct():pm()
{1="a", 2="b", 3="f", "e"="c"}
>   (Cc("a","b") * Cc("c","d"):Cg"e" * Cc"f" * Cb"e"):Ct():pm()
{1="a", 2="b", 3="f", 4="c", 5="d", "e"="c"}
>   (Cc("a","b") * Cc("c","d"):Cg"e" * Cc"f" * Cb"e")     :pm()
"a" "b" "f" "c" "d"
>

If we define a table like this,

{20, 30, a=40, a=50, 60}

then the second assignment to "a" will override the first one; lpeg.Cg
does something similar to that...

Let me use this notation for Ltables. This lpeg pattern

Cc("a","b") * Cc("c","d"):Cg"e" * Cc"f"

matches the empty string, and returns this Ltable:

{."a" "b" e={."c" "d".} "f".}

Ltables can be coerced both to tables, by lpeg.Ct, and to lists of
values. The output of

(Cc("a","b") * Cc("c","d"):Cg"e" * Cc"f")             :pm()
(Cc("a","b") * Cc("c","d"):Cg"e" * Cc"f")        :Ct():pm()
(Cc("a","b") * Cc("c","d"):Cg"e" * Cc"f" * Cb"e"):Ct():pm()
(Cc("a","b") * Cc("c","d"):Cg"e" * Cc"f" * Cb"e")     :pm()

is:

>   (Cc("a","b") * Cc("c","d"):Cg"e" * Cc"f")             :pm()
"a" "b" "f"
>   (Cc("a","b") * Cc("c","d"):Cg"e" * Cc"f")        :Ct():pm()
{1="a", 2="b", 3="f", "e"="c"}
>   (Cc("a","b") * Cc("c","d"):Cg"e" * Cc"f" * Cb"e"):Ct():pm()
{1="a", 2="b", 3="f", 4="c", 5="d", "e"="c"}
>   (Cc("a","b") * Cc("c","d"):Cg"e" * Cc"f" * Cb"e")     :pm()
"a" "b" "f" "c" "d"
>

In my (current) way of thinking this lpeg pattern

Cc("a","b") * Cc("c","d"):Cg"e" * Cc"f" * Cb"e"

returns this Ltable,

{."a" "b" ["e"]={."c" "d".} "f" ["e"]}

where the ["e"]={."c" "d".} stores an Ltable in the entry with the key
"e" in the current Ltable, and the ["e"] at the end reads the value
stored in the key "e", coerces it to a list of values, and adds these
values to the current Ltable...

Questions:
==========
What is the official name of this data structure? Is there a place in
which it is described in more details than in the Lpeg manual? Where?
The Lpeg manual only talks about "most recent group capture", and it
says this:

"Most recent means the last complete outermost group capture with
the given name. A Complete capture means that the entire pattern
corresponding to the capture has matched. An Outermost capture
means that the capture is not inside another complete capture."

here:

http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html#cap-g
http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html#cap-b

I hope I'm not the only person who finds that too terse... and also,
is there anyone here - besides me - who have tried to draw diagrams to
understand how the operations on captures work? My current diagrams
are here:

http://anggtwu.net/LATEX/2023lpegcaptures.pdf