[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: LPeg back captures
- From: Sean Conner <sean@...>
- Date: Wed, 23 Oct 2013 21:30:15 -0400
Okay, I may not fully understand back captures in LPeg. Here's the
problem: I'm attempting to parse NAPTR DNS records. Once I obtain a given
record, I have a string in the form of: [1]
!^.*$!pstndata:cnam/+15714344048;;charset=us-ascii;ds=local;score=98,gn=CRYSTA;sn=SPERBER!
(this is the regexp portion of the NAPTR record). RFC-3402 gives the
following grammar for this field:
subst-expr = delim-char ere delim-char repl delim-char *flags
delim-char = "/" / "!" / <Any octet not in 'POS-DIGIT' or 'flags'>
; All occurrences of a delim_char in a subst_expr
; must be the same character.>
ere = <POSIX Extended Regular Expression>
repl = *(string / backref)
string = *(anychar / escapeddelim)
anychar = <any character other than delim-char>
escapeddelim = "\" delim-char
backref = "\" POS-DIGIT
flags = "i"
POS-DIGIT = "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9"
I have this translated into LPeg:
DIGIT = R"09"
delim_char = P"!" -- Cb("delim")
flags = P"i"
backref = P"\\" * DIGIT
escapeddelim = P"\\" * delim_char
anychar = P(1) - delim_char
string = (escapeddelim + anychar)^1
repl = C((string + backref)^0)
ere = C((P(1) - delim_char)^0)
idelim_char = Cg(P"/" + P"!" + (P(1) - (DIGIT + flags)),"delim")
regexp = Ct(
idelim_char
* Cg(ere,"re")
* delim_char
* Cg(repl,"replace")
* delim_char
* Cg(flags^0,"flags")
)
and it works, except for the hardcoded delimeter. If I leave delim_char as
is, I get the expected data:
regexp =
{
re = "^.*$",
flags = "",
replace = "pstndata:cnam/+15714344048;;charset=us-ascii;ds=local;score=98,gn=CRYSTA;sn=SPERBER"
delim = "!",
}
But if I try to use a backreference (delim_char = Cb("delim")), it doesn't work:
regexp =
{
[1] = "!",
[2] = "!",
replace = "",
flags = "",
re = "",
delim = "!",
}
I'm wondering, am I using back references correctly? The example in the
LPeg documentation [2] is close to what I want, but I'm missing something.
I know I can use Lua's builtin regular expressions to break this string
apart, but I'd rather use LPeg, if only to figure out how to parse this type
of data.
-spc (Who's really puzzled by this)
[1] All characters appearing in this work are fictitious. Any
resemblance to real persons, living or dead, is purely coincidental.
[2] http://www.inf.puc-rio.br/~roberto/lpeg/