[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LPeg back captures
- From: Andrew Starks <andrew.starks@...>
- Date: Wed, 23 Oct 2013 22:04:36 -0500
On Wednesday, October 23, 2013, Sean Conner wrote:
Okay, I may not fully understand back captures in LPeg. Here's the
problem: I'm attempting to parse NAPTR DNS records. Once I obtain a given
record, I have a string in the form of: [1]
!^.*$!pstndata:cnam/+15714344048;;charset=us-ascii;ds=local;score=98,gn=CRYSTA;sn=SPERBER!
(this is the regexp portion of the NAPTR record). RFC-3402 gives the
following grammar for this field:
subst-expr = delim-char ere delim-char repl delim-char *flags
delim-char = "/" / "!" / <Any octet not in 'POS-DIGIT' or 'flags'>
; All occurrences of a delim_char in a subst_expr
; must be the same character.>
ere = <POSIX Extended Regular _expression_>
repl = *(string / backref)
string = *(anychar / escapeddelim)
anychar = <any character other than delim-char>
escapeddelim = "\" delim-char
backref = "\" POS-DIGIT
flags = "i"
POS-DIGIT = "1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9"
I have this translated into LPeg:
DIGIT = R"09"
delim_char = P"!" -- Cb("delim")
flags = P"i"
backref = P"\\" * DIGIT
escapeddelim = P"\\" * delim_char
anychar = P(1) - delim_char
string = (escapeddelim + anychar)^1
repl = C((string + backref)^0)
ere = C((P(1) - delim_char)^0)
idelim_char = Cg(P"/" + P"!" + (P(1) - (DIGIT + flags)),"delim")
regexp = Ct(
idelim_char
* Cg(ere,"re")
* delim_char
* Cg(repl,"replace")
* delim_char
* Cg(flags^0,"flags")
)
and it works, except for the hardcoded delimeter. If I leave delim_char as
is, I get the expected data:
regexp =
{
re = "^.*$",
flags = "",
replace = "pstndata:cnam/+15714344048;;charset=us-ascii;ds=local;score=98,gn=CRYSTA;sn=SPERBER"
delim = "!",
}
But if I try to use a backreference (delim_char = Cb("delim")), it doesn't work:
regexp =
{
[1] = "!",
[2] = "!",
replace = "",
flags = "",
re = "",
delim = "!",
}
I'm wondering, am I using back references correctly? The example in the
LPeg documentation [2] is close to what I want, but I'm missing something.
I know I can use Lua's builtin regular expressions to break this string
apart, but I'd rather use LPeg, if only to figure out how to parse this type
of data.
-spc (Who's really puzzled by this)
[1] All characters appearing in this work are fictitious. Any
resemblance to real persons, living or dead, is purely coincidental.
[2] http://www.inf.puc-rio.br/~roberto/lpeg/
Crap, sorry. That was wrong. I mixed up the other half of it, which is Cg.
Okay, first, I believe that you need to have your "delim" defined in a Cg before the back capture, which I didn't see. So
delim = <some capture>
delim_pair = Cg (delim, "delim")
Somepat= P(term*delim_pair*otherterm*Cb("delim"))
In this context, Cg does not provide key=value. It names the chapter so that you can reuse it by name, like Cc, but with a value tied to a previous capture.
I know that you probably know that. I say it in case I'm wrong and someone can correct me. :)
-Andrew