lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great Cosmin Apreutesei once stated:
> Hi,
> 
> I'm trying to parse http headers with lpeg.re as an exercise for
> learning lpeg and because http has a few cases that need recursive
> parsing. There's a few things I don't know how to express in lpeg.re
> (or lpeg) yet.
> 
> For instance, I have this syntax: k1=v1,k2=v2,... then v1 and v2 have
> themselves a different syntax depending on the keys.
> 
> Consider this:
> 
> list <- element (',' element)*
> element <- length / name
> length <- kv -- but I also want k <- 'length' and v <- length_value in
> order to succeed
> name <- kv -- but I also want k <- 'name' and v <- name_value in order
> to succeed
> kv <- k '=' v
> k <- {[^=]+}
> v <- {[^,]*}
> length_value <- [0-9]+
> name_value <- [a-z]+
> 
> I want both <length> and <name> to conform to kv as above, but I also
> want the captured value of <length> to conform to <length_value> and
> the captured value of <name> to conform to <name_value>. Basically I
> want to be able to do more parsing on the captures before succeeding
> on a match. Can I express something like that? I know I can do element
> <- kv -> parse_kv and do furtehr matching inside the parse_kv
> function, but I wanted to avoid fragmenting the parser in multiple
> stages like that.
> 
> Any hints appreciated. Thanks.

I would do this as:

	list		<- element (COMMA element)*
	element		<- length
			/  name

	length		<- 'length' EQ length_value
	name		<- 'name'   EQ name_value

	length_value	<- %d+
	name_value	<- [a-z]+

	COMMA		<- ','
	EQ		<- '='
	
  Yes, both the length and name fields have a similar structure, but since
logically, they're of different semantic types, it makes sense (to me) to
separate them.  The reason I broke out the ',' and '=' sign as their own
productions is to provide a bit of documentation, and make it easier to add
whitespace:

	COMMA		<- %s* ',' %s*
	EQ		<- %s* '=' %s*

(What I really need to do is post my LPeg grammer for parsing email
headers---it really showcases nearly all the features of the re module, but
until I get around to that, if anyone is interested, I can mail them a copy;
I should note that HTML headers are pretty much the same format as email
headers)

  But, change the code slightly, and you can get back a Lua table:

local re = require "re"

G = [[
header          <- list -> {}
list            <- element (COMMA element)*

element         <- length
                /  name
     
length          <- 'length' EQ {:length: length_value :}
name            <- 'name'   EQ {:name:   name_value   :}
                         
length_value    <- %d+        
name_value      <- [a-z]+     
                              
COMMA           <- %s* ',' %s*
EQ              <- %s* '=' %s*
]]

p = re.compile(G)
x = p:match[[name = foobar , length = 33]]
print(x.name,x.length)
foobar	22

  Although repeated lengths (for example) will only return the last value. 
Storing each value (for repeats) is left as an exercise for the reader.

  -spc (Who would really love folding captures in the re module ... )