lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


  Okay, I may not fully understand back captures in LPeg.  Here's the
problem:  I'm attempting to parse NAPTR DNS records.  Once I obtain a given
record, I have a string in the form of: [1]

!^.*$!pstndata:cnam/+15714344048;;charset=us-ascii;ds=local;score=98,gn=CRYSTA;sn=SPERBER!

(this is the regexp portion of the NAPTR record).  RFC-3402 gives the
following grammar for this field:

	subst-expr   = delim-char  ere  delim-char  repl  delim-char  *flags
	delim-char   = "/" / "!" / <Any octet not in 'POS-DIGIT' or 'flags'>
	                   ; All occurrences of a delim_char in a subst_expr
	                   ; must be the same character.>
	ere          = <POSIX Extended Regular Expression>
	repl         = *(string / backref)
	string       = *(anychar / escapeddelim)
	anychar      = <any character other than delim-char>
	escapeddelim = "\" delim-char
	backref      = "\" POS-DIGIT
	flags        = "i"
	POS-DIGIT    = "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9"

  I have this translated into LPeg:

	DIGIT        = R"09"
	delim_char   = P"!" -- Cb("delim")               
	flags        = P"i"
	backref      = P"\\" * DIGIT
	escapeddelim = P"\\" * delim_char
	anychar      = P(1) - delim_char                
	string       = (escapeddelim + anychar)^1
	repl         = C((string + backref)^0)
	ere          = C((P(1) - delim_char)^0)
	idelim_char  = Cg(P"/" + P"!" + (P(1) - (DIGIT + flags)),"delim")
	regexp       = Ct(
	                    idelim_char
	                    * Cg(ere,"re")
	                    * delim_char  
	                    * Cg(repl,"replace")
	                    * delim_char 
	                    * Cg(flags^0,"flags")
	                 )

and it works, except for the hardcoded delimeter.  If I leave delim_char as
is, I get the expected data:

	regexp =
	{
	  re = "^.*$",
	  flags = "",
	  replace = "pstndata:cnam/+15714344048;;charset=us-ascii;ds=local;score=98,gn=CRYSTA;sn=SPERBER"
	  delim = "!",
	}

But if I try to use a backreference (delim_char = Cb("delim")), it doesn't work:

	regexp =
	{
	  [1] = "!",
	  [2] = "!",
	  replace = "",
	  flags = "",
	  re = "",
	  delim = "!",
	}

  I'm wondering, am I using back references correctly?  The example in the
LPeg documentation [2] is close to what I want, but I'm missing something. 
I know I can use Lua's builtin regular expressions to break this string
apart, but I'd rather use LPeg, if only to figure out how to parse this type
of data.

  -spc (Who's really puzzled by this)

[1]	All characters appearing in this work are fictitious. Any
	resemblance to real persons, living or dead, is purely coincidental.

[2]	http://www.inf.puc-rio.br/~roberto/lpeg/