[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: lpeg.Cb bug (?) with repeated evaluation of group capture pattern
- From: Sean Conner <sean@...>
- Date: Wed, 14 Sep 2016 20:16:03 -0400
It was thus said that the Great Roberto Ierusalimschy once stated:
>
> Except for 'Cmt', captures in LPeg have a functional flavor. LPeg does
> not specify in what order it evaluates its captures. It does not specify
> even whether it actually will execute a capture. All that matters are
> the results from the captures. (For instance, the manual does not say
> that LPeg will not execute captures for failed match attempts; what it
> ensures is that those values are not used in its results.)
>
> In general, we should avoid captures with side effects (again, except
> for 'Cmt') or, more to the point, we should not depend on these side
> effects. (Checking whether two tables with the same contents are the
> same is an example of a dependency on a side effect.) If we must
> accumulate something, we should use a fold.
To clarify things. Assume the following:
lpeg = require "lpeg"
Cmt = lpeg.Cmt
Cg = lpeg.Cg
Cb = lpeg.Cb
Ct = lpeg.Ct
The following code fragment:
one = Cg(Ct"","foo")
* Cb("foo")
* Cb("foo")
print(one:match "")
Working our way out. For Ct():
"Creates a table capture. This capture creates a table and puts all
values from all anonymous captures made by patt inside this table in
successive integer keys, starting at 1. Moreover, for each named
capture group created by patt, the first value of the group is put
into the table with the group name as its key. The captured value is
only the table."
The given pattern, "", is an empty string, so it should always match. It
returns a table as the result of the capture. Now, Cg():
"Creates a group capture. It groups all values returned by patt into
a single capture. The group may be anonymous (if no name is given)
or named with the given name (which can be any non-nil Lua value).
"An anonymous group serves to join values from several captures into
a single capture. A named group has a different behavior. In most
situations, a named group returns no values at all. Its values are
only relevant for a following back capture or when used inside a
table capture."
I'm taking the capture from Ct() (a table) and associating it with the
name "foo". I'm not inside a Ct(), so this is only relevant if I'm going to
do a back capture (otherwise, why bother?). So now we come to Cb():
"Creates a back capture. This pattern matches the empty string and
produces the values produced by the most recent group capture named
name (where name can be any Lua value).
"Most recent means the last complete outermost group capture with
the given name. A Complete capture means that the entire pattern
corresponding to the capture has matched. An Outermost capture means
that the capture is not inside another complete capture."
So the first Cb() returns the table returned by Ct(). Why does the second
call to Cb() return a new table? Cg() groups all values returned by the
pattern into a single capture, so there should only *be* one capture. Or am
I misreading the manual? Does it mysteriously "rerun" the pattern each time
it's called?
But now let's look at this bit of code:
two = Cg(Cmt("",function(s,p,c) return p,{} end),"foo")
* Cb("foo")
* Cb("foo")
print(two:match "")
The only change is Cmt():
"Creates a match-time capture. Unlike all other captures, this one
is evaluated immediately when a match occurs. It forces the
immediate evaluation of all its nested captures and then calls
function.
"The given function gets as arguments the entire subject, the
current position (after the match of patt), plus any capture values
produced by patt.
"The first value returned by function defines how the match happens.
If the call returns a number, the match succeeds and the returned
number becomes the new current position. (Assuming a subject s and
current position i, the returned number must be in the range [i,
len(s) + 1].) If the call returns true, the match succeeds without
consuming any input. (So, to return true is equivalent to return i.)
If the call returns false, nil, or no value, the match fails.
"Any extra values returned by the function become the values
produced by the capture."
And now the two calls to Cb() return the same table. I think the
confusion here is how
Cg(Ct"","foo")
seems to keep returning new captures for the given pattern for each call to
Cb(), whereas
Cg(Cmt("",function(s,p,c) return p,{} end),"foo")
returns the same capture for each call to Cb(). What, exactly, is the
distinction between the two?
> If we must accumulate something, we should use a fold.
Again, given the following:
lpeg = require "lpeg"
Cf = lpeg.Cf
Ct = lpeg.Ct
C = lpeg.C
I wish to collect values into a table. You may even say I wish to
"accumulate" said values into a table. So, according to you, I should use a
fold for this:
one = Cf(Ct"" * C(1) * C(1),function(t,v) t[#t+1] = v return t end)
x = one:match"ab"
print(x[1],x[2])
but does not this do the same thing?
two = Ct(C(1) * C(1)) -- nary a fold in sight!
x = two:match"ab"
print(x[1],x[2])
-spc (or am I being too pedantic?)