lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


  I managed to generate a segfault with LPEG and I can reproduce the issue
with this code [1]:

local lpeg = require "lpeg"
local Cg = lpeg.Cg
local Cc = lpeg.Cc
local Cb = lpeg.Cb
local P  = lpeg.P

local cnt = Cg(Cc(0),'count')
          * (P(1) * Cg(Cb'count' / function(c) return c + 1 end,'count'))^0
          * Cb'count'

print(cnt:match(string.rep("x",512)))
print(cnt:match(string.rep("x",512+128))) -- CRASH at some point past this line
print(cnt:match(string.rep("x",512+128+32)))
print(cnt:match(string.rep("x",512+128+32+16)))
print(cnt:match(string.rep("x",512+128+32+16+4)))
print(cnt:match(string.rep("x",512+128+32+16+8)))
print(cnt:match(string.rep("x",512+128+64)))
print(cnt:match(string.rep("x",512+256)))
print(cnt:match(string.rep("x",1024)))

What I did not expect was the 2,900 C callstack entries this produced (thus
causing the crash).  The LPEG documentation is terse on the subject.  My
intent was to count a series of captures while avoiding an external
variable.  Yes, I could use a state table with lpeg.Carg() but I wasy trying
to avoid that [2].  When reading the LPEG documentation trying to find a
warning about this (especially the large call stack) I didn't find much. 
There is this line:

	Therefore, captures should avoid side effects.

but the previous parenthetical gives an example of a "side effect":

	(As an example, consider the pattern lpeg.P"a" / func / 0. Because
	the "division" by 0 instructs LPeg to throw away the results from
	the pattern, LPeg may or may not call func.)

So I'm taking this to mean the normal types of side effects in programming.
Then there's this bit about lpeg.Cb():

	Creates a *back capture*. This pattern matches the empty string and
	produces the values produced by the *most recent* group capture
	named name (where name can be any Lua value).

	*Most recent* means the last *complete outermost group* capture with
	the given name. A *Complete* capture means that the entire pattern
	corresponding to the capture has matched. An *Outermost* capture
	means that the capture is not inside another complete capture.

	In the same way that LPeg does not specify when it evaluates
	captures, it does not specify whether it reuses values previously
	produced by the group or re-evaluates them.

That last bit *could* mean (in the sample code above) that lpeg.Cb()
re-evaluates the lpeg.Cc(0) each instance and thus, my attempt to count
using lpeg.Cg() and lpeg.Cb() could return 0 or 1 just as well as the actual
count [4].

  Yes, I admit to not reading the manual with a magnifying glass and
tweezers, but this terseness of the manual language (of both Lua and LPEG)
is a recurring theme on this list.  Perhaps a few examples could help
clarify things?

  -spc (Will do stupid things in code ... )

[1]	This just reproduces the issue---I know there are better ways to get
	the length of a string.  

[2]	It requires additional parameters to lpeg.match(), thus complicating
	the call.

[3]	Usually with a pattern like:

		pattern = Cg(Cc"unknown",'name')
		        * Cg(R("az","AZ")^1,'name')^-1
		parse   = Ct(pattern)

	to define a default value in case a pattern doesn't exist.

[4]	I would be interested in knowing of other LPEG implementations.