Re: help with lpeg

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: help with lpeg
From: Cosmin Apreutesei <cosmin.apreutesei@...>
Date: Thu, 27 Dec 2012 18:47:35 +0200

Hi Sean, thanks for responding. My comments inline.

> I would do this as:
>
>         list            <- element (COMMA element)*
>         element         <- length
>                         /  name
>
>         length          <- 'length' EQ length_value
>         name            <- 'name'   EQ name_value
>
>         length_value    <- %d+
>         name_value      <- [a-z]+
>
>         COMMA           <- ','
>         EQ              <- '='
>
>   Yes, both the length and name fields have a similar structure, but since
> logically, they're of different semantic types, it makes sense (to me) to
> separate them.  The reason I broke out the ',' and '=' sign as their own
> productions is to provide a bit of documentation, and make it easier to add
> whitespace:
>
>         COMMA           <- %s* ',' %s*
>         EQ              <- %s* '=' %s*
>

Problem here is the keywords are case-insensitive. But in practice
it's even more complicated: consider that you can also escape
characters with \code in the middle of the keyword. But you can only
use escape codes when the keyword is between double-quotes. Ha.

These kinds of rules make me think that the parsing should be done in
multiple stages, each stage parsing on the captures of the last one.
Seems to me that http was designed to be parsed like this: first find
out where headers stop (CRLF + CRLF), then separate the headers from
one another (CRLF + non-space), then separate keywords from values
(':'), then fold any duplicate headers, then convert all whitespace to
a single space, then tokenize the values with a recursive parser
(because of the damn quoted-strings).

So what I want is the ability to apply a pattern on a capture, i.e. is
match on the capture some more and give back some other captures in
return, and continue from there (so it's all done in-context). Either
that, or a different way of thinking about parsing that doesn't needs
a feature like that.


> (What I really need to do is post my LPeg grammer for parsing email
> headers---it really showcases nearly all the features of the re module, but
> until I get around to that, if anyone is interested, I can mail them a copy;
> I should note that HTML headers are pretty much the same format as email
> headers)

Please do, that could help me a lot. Thanks.

Follow-Ups:
- Re: help with lpeg, Sean Conner

References:
- help with lpeg, Cosmin Apreutesei
- Re: help with lpeg, Sean Conner

Prev by Date: Re: [ANN] LuaHashMap: An easy to use hash table library for C (release candidate)
Next by Date: [ANN] Microlight 1.1
Previous by thread: Re: help with lpeg
Next by thread: Re: help with lpeg
Index(es):
- Date
- Thread