lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]




On 14/09/16 09:16 PM, Sean Conner wrote:
It was thus said that the Great Roberto Ierusalimschy once stated:
Except for 'Cmt', captures in LPeg have a functional flavor. LPeg does
not specify in what order it evaluates its captures. It does not specify
even whether it actually will execute a capture. All that matters are
the results from the captures.  (For instance, the manual does not say
that LPeg will not execute captures for failed match attempts; what it
ensures is that those values are not used in its results.)

In general, we should avoid captures with side effects (again, except
for 'Cmt') or, more to the point, we should not depend on these side
effects.  (Checking whether two tables with the same contents are the
same is an example of a dependency on a side effect.)  If we must
accumulate something, we should use a fold.
   To clarify things.  Assume the following:

	lpeg = require "lpeg"
	Cmt  = lpeg.Cmt
	Cg   = lpeg.Cg
	Cb   = lpeg.Cb
	Ct   = lpeg.Ct

   The following code fragment:

	one = Cg(Ct"","foo")
	    * Cb("foo")
	    * Cb("foo")

	print(one:match "")

   Working our way out.  For Ct():

	"Creates a table capture. This capture creates a table and puts all
	values from all anonymous captures made by patt inside this table in
	successive integer keys, starting at 1. Moreover, for each named
	capture group created by patt, the first value of the group is put
	into the table with the group name as its key. The captured value is
	only the table."

The given pattern, "", is an empty string, so it should always match.  It
returns a table as the result of the capture.  Now, Cg():

	"Creates a group capture. It groups all values returned by patt into
	a single capture. The group may be anonymous (if no name is given)
	or named with the given name (which can be any non-nil Lua value).

	"An anonymous group serves to join values from several captures into
	a single capture. A named group has a different behavior. In most
	situations, a named group returns no values at all. Its values are
	only relevant for a following back capture or when used inside a
	table capture."

   I'm taking the capture from Ct() (a table) and associating it with the
name "foo".  I'm not inside a Ct(), so this is only relevant if I'm going to
do a back capture (otherwise, why bother?).  So now we come to Cb():

	"Creates a back capture. This pattern matches the empty string and
	produces the values produced by the most recent group capture named
	name (where name can be any Lua value).

	"Most recent means the last complete outermost group capture with
	the given name. A Complete capture means that the entire pattern
	corresponding to the capture has matched. An Outermost capture means
	that the capture is not inside another complete capture."

   So the first Cb() returns the table returned by Ct().  Why does the second
call to Cb() return a new table?  Cg() groups all values returned by the
pattern into a single capture, so there should only *be* one capture.  Or am
I misreading the manual?  Does it mysteriously "rerun" the pattern each time
it's called?

Perhaps Ct doesn't actually create a table, but rather only signals that a table should be created.

Perhaps this is some sort of optimization for the failure case, so that it doesn't needlessly create tables just to discard them afterwards.

Perhaps this is so it doesn't rely on passing lua_States around.

Those are just my guesses.


   But now let's look at this bit of code:

	two = Cg(Cmt("",function(s,p,c) return p,{} end),"foo")
	    * Cb("foo")
	    * Cb("foo")

	print(two:match "")

   The only change is Cmt():

	"Creates a match-time capture. Unlike all other captures, this one
	is evaluated immediately when a match occurs. It forces the
	immediate evaluation of all its nested captures and then calls
	function.

	"The given function gets as arguments the entire subject, the
	current position (after the match of patt), plus any capture values
	produced by patt.

	"The first value returned by function defines how the match happens.
	If the call returns a number, the match succeeds and the returned
	number becomes the new current position. (Assuming a subject s and
	current position i, the returned number must be in the range [i,
	len(s) + 1].) If the call returns true, the match succeeds without
	consuming any input. (So, to return true is equivalent to return i.)
	If the call returns false, nil, or no value, the match fails.

	"Any extra values returned by the function become the values
	produced by the capture."

Yes, because, as it says, Cmt gets evaluated immediately and the results are actually used, not just signaled.


   And now the two calls to Cb() return the same table.  I think the
confusion here is how

	Cg(Ct"","foo")

seems to keep returning new captures for the given pattern for each call to
Cb(), whereas

	Cg(Cmt("",function(s,p,c) return p,{} end),"foo")

returns the same capture for each call to Cb().  What, exactly, is the
distinction between the two?

If we must accumulate something, we should use a fold.
   Again, given the following:

	lpeg = require "lpeg"
	Cf   = lpeg.Cf
	Ct   = lpeg.Ct
	C    = lpeg.C

   I wish to collect values into a table.  You may even say I wish to
"accumulate" said values into a table.  So, according to you, I should use a
fold for this:

	one = Cf(Ct"" * C(1) * C(1),function(t,v) t[#t+1] = v return t end)
	x = one:match"ab"
	print(x[1],x[2])

but does not this do the same thing?

	two = Ct(C(1) * C(1)) -- nary a fold in sight!
	x = two:match"ab"
	print(x[1],x[2])

   -spc (or am I being too pedantic?)


Just my guesses.

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.