lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great Dirk Laurie once stated:
> 2017-07-03 21:35 GMT+02:00 Soni L. <fakedme@gmail.com>:
> 
> > That was a feature request. And that trick doesn't work for captures
> > beyond 1 without introducing nils or a massive performance hit.
> 
> Would the requested feature make `pat/-2` drop the second capture
> or the first two captures?

  While I'm sympathetic to the proposal, I don't think it can be done
without a drastic redesign of LPeg at the lowest level.  For example:

	SP = lpeg.P" "
	a  = lpeg.P"a" * SP^0
	b  = lpeg.C(a)

  As of now, c will return:

	"a"	-> "a"
	"ab"	-> "a"
	"a b "	-> "a "
	"a  b"	-> "a  "

  The proprosal:

	b = lpeg.C(a / -2)

is intended to the second patter, but as far as LPeg is concerned, there
*is* no second pattern.  By itself, lpeg.P() doesn't return captures [1]. 
What lpeg.P() really does is match (some, none) input.  So when you write:

	a = lpeg.P"a" * SP^0
	a:match("abcd")

what happens is that lpeg.P() matches the literal string "a" and returns 2. 
The next match starts a position 2.  That pattern will attempt to apply
lpeg.P" " zero or more times.  In this case, it matches zero times, so it's
happy, and also returns 2.  There's no more patterns that need matching, so
the whole thing succeeds.  Now, with:

	a:match("a  bcd")

we have lpeg.P"a" matches, and returns 2.  Then the pattern to match 
lpeg.P" " zero or more times fires, and hey!  We have two spaces, so it's
happy, and return 4.  Again, no more patterns to match, so we end.

  Now, let's add a capture around that:

	b = lpeg.C(a)
	b:match("a  bcd")

  lpeg.C() *is* a capture.  It records the current position when it starts,
in this case 1.  It then runs the pattern.  The pattern is a success and
returns 4 (see previous paragraph).  There's no more patterns to match, and
the given pattern succeeded from position 1 up to (but not including) 4.  So
lpeg.C() returns the characters from position 1 to 3 (inclusive).  It's a
*single* capture.  Now, let's use the proposed rule:

	b = lpeg.C(a / -2)

That won't work because there's no second capture to return.  There's only
one.  Okay, what if we write:

	a = lpeg.P"a" * (lpeg.P" "^0 / -1) -- expand SP in place
	b = lpeg.C(a)
	b:match("a  bcd")

  If we do `a:match("a  b")` lpeg.P"a" runs, matches and returns 2.  Then
lpeg.P" "^0 runs and returns ... what?  The LPeg documentation states:

	patt / number	the n-th value captured by patt, or no value when
			number is zero

But that's as a capture.  When inside a capture:

	a = lpeg.P"a" * (lpeg.P"_"^0 / 0) -- the part of a space will
	b = lpeg.P"b" * (lpeg.P"_"^0 / 0) -- be played by '_'
	c = lpeg.C(a * b)

	print(c:match "a_b_")

The capture is "a_b_", because lpeg.P" "^0 will still return the position
past its match, but the `/0` part is ignored by lpeg.C() in this case.  So
what you want is to mark the end of a at position 2, and resume capturing at
position 3 and somehow have lpeg.C() remember to ignore some input at some
point.

  And I think that doing so will fundamentally have to change how LPeg
works.

  -spc (So I wouldn't expect to see this feature any time soon ... )

[1]	Except if given a function, in which case, it's the same as calling
	lpeg.Cmt(), but we'll ignore that for now.