[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: help with lpeg
- From: Sean Conner <sean@...>
- Date: Thu, 27 Dec 2012 05:42:18 -0500
It was thus said that the Great Cosmin Apreutesei once stated:
> Hi,
>
> I'm trying to parse http headers with lpeg.re as an exercise for
> learning lpeg and because http has a few cases that need recursive
> parsing. There's a few things I don't know how to express in lpeg.re
> (or lpeg) yet.
>
> For instance, I have this syntax: k1=v1,k2=v2,... then v1 and v2 have
> themselves a different syntax depending on the keys.
>
> Consider this:
>
> list <- element (',' element)*
> element <- length / name
> length <- kv -- but I also want k <- 'length' and v <- length_value in
> order to succeed
> name <- kv -- but I also want k <- 'name' and v <- name_value in order
> to succeed
> kv <- k '=' v
> k <- {[^=]+}
> v <- {[^,]*}
> length_value <- [0-9]+
> name_value <- [a-z]+
>
> I want both <length> and <name> to conform to kv as above, but I also
> want the captured value of <length> to conform to <length_value> and
> the captured value of <name> to conform to <name_value>. Basically I
> want to be able to do more parsing on the captures before succeeding
> on a match. Can I express something like that? I know I can do element
> <- kv -> parse_kv and do furtehr matching inside the parse_kv
> function, but I wanted to avoid fragmenting the parser in multiple
> stages like that.
>
> Any hints appreciated. Thanks.
I would do this as:
list <- element (COMMA element)*
element <- length
/ name
length <- 'length' EQ length_value
name <- 'name' EQ name_value
length_value <- %d+
name_value <- [a-z]+
COMMA <- ','
EQ <- '='
Yes, both the length and name fields have a similar structure, but since
logically, they're of different semantic types, it makes sense (to me) to
separate them. The reason I broke out the ',' and '=' sign as their own
productions is to provide a bit of documentation, and make it easier to add
whitespace:
COMMA <- %s* ',' %s*
EQ <- %s* '=' %s*
(What I really need to do is post my LPeg grammer for parsing email
headers---it really showcases nearly all the features of the re module, but
until I get around to that, if anyone is interested, I can mail them a copy;
I should note that HTML headers are pretty much the same format as email
headers)
But, change the code slightly, and you can get back a Lua table:
local re = require "re"
G = [[
header <- list -> {}
list <- element (COMMA element)*
element <- length
/ name
length <- 'length' EQ {:length: length_value :}
name <- 'name' EQ {:name: name_value :}
length_value <- %d+
name_value <- [a-z]+
COMMA <- %s* ',' %s*
EQ <- %s* '=' %s*
]]
p = re.compile(G)
x = p:match[[name = foobar , length = 33]]
print(x.name,x.length)
foobar 22
Although repeated lengths (for example) will only return the last value.
Storing each value (for repeats) is left as an exercise for the reader.
-spc (Who would really love folding captures in the re module ... )