lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 14 September 2016 at 04:59, Sean Conner <sean@conman.org> wrote:
>
>   Seeing how there's going to be a bug fix for LPeg Real Soon Now (TM), I
> thought it might be a good time to float a proposal for a new lpeg function.
>
>   Some background:  I parse a lot of Internet related messages and URLs
> (email and SIP messages, sip:, tel:, http: and https: URLs, etc.) and it's
> amazing how often name/value pairs keep popping up.  Usually there are a
> fixed number of defined name/value pairs but the grammars almost always
> allow user defined pairs.  Since I use LPeg for all of my parsing needs, I
> like to parse the data into Lua tables and the most problematic part is
> handing open ended name/value pairs.
>
>   Let me give a simplified example:  A simple file of name/value pairs
> (alpha characters only---I want to keep things really simple) one per line,
> name and value separated by an '=' sign; order does not matter.  There are
> two fields defined, "foo" and "bar" (which if not provided, default values
> will be given).  Two examples follow:
>
>         Example 1:
>                 foo=de
>                 bar=true
>                 alpha=Sean
>                 bravo=Conner
>
>         Example 2:
>                 yankee=Sean
>                 zulu=Conner
>                 foo=se
>
>   I would prefer to return a table like:
>
>         {
>           foo = "de",
>           bar = "true",
>           other =
>           {
>             alpha = "Sean",
>             bravo = "Conner"
>           }
>         }
>
>   It is not easy to get that.  If I do (assume everything defined):
>
>         -- for foo abd bar, assume more error checking than you see here
>         foo   = P"foo"  * EQ * Cg(value,"foo") * EOL
>         bar   = P"bar"  * EQ * Cg(value,"bar") * EOL
>         other = C(name) * EQ * C(value)        * EOL
>         list  = Ct( -- CAPTURE INTO  A TABLE
>                       Cg(Cc"en","foo")    -- DEFAULT VALUE
>                     * Cg(Cc"false","bar") -- DEFAULT VALUE
>                     * (foo + bar + other)^0
>                   )
>
> I get:
>
>         {
>           [1] = "alpha",
>           [2] = "Sean",
>           [3] = "bravo",
>           [4] = "Conner",
>           bar = "true",
>           foo = "de"
>         }
>
> Nothing at all what I want.  The next solution is to use a folding capture:
>
>         foo   = Cg(C"foo"  * EQ * C(value)) * EOL
>         bar   = Cg(C"bar"  * EQ * C(value)) * EOL
>         other = Cg(C(name) * EQ * C(value)) * EOL
>
>         list = Cf( -- FOLDING CAPTURE
>                    Ct(Cc()) -- SEE [1]
>                    * Cg(Cc "foo" * Cc "en")
>                    * Cg(Cc "bar" * Cc "false")
>                    * (foo + bar + other)^0,
>                    function(t,n,v)
>                      t[n] = v
>                      return t
>                    end
>                  )
>
> It's closer to what I want, and certainly usable:
>
>         {
>           bravo = "Conner",
>           bar   = "true",
>           alpha = "Sean",
>           foo   = "de"
>         }
>
> and yes, I can complicate the folding function to stuff the non-standard
> headers into a sub table, but honestly, I'd rather not do that.

Hold on. This is the solution I use and I'm pretty happy with it.
Instead of you little helper function, you can use `rawset`, which is
a lua builtin and does exactly what you wrote (except with raw
accesses). This is mentioned in the lpeg manual.

Furthermore, your 'foo' and 'bar' cases are redundant.
For clarity, I'd usually just mention them in a comment.

Rewriting your example:

local pair = Cg(C(name) * EQ * C(value) * EOL)
list = Cf(Ct(true) * pair^0, rawset)

Which isn't so bad, is it?
You'll find this type of thing all over my lpeg code.
e.g. https://github.com/daurnimator/lpeg_patterns/blob/12d46017c074aac8345a2e8dc41ca3c091404e92/lpeg_patterns/http.lua#L380
(just search that file for `Ct(true)`!)

> [1]     Doing a Cc{} don't work here, as that always returns the *same*
>         table across different parses (the table is created at compile time
>         and returned at runtime).  A bare Ct() fails as it expects a
>         pattern; thus, the call Ct(Cc()).  Cc() returns a pattern (satisfies
>         Ct()) of nothing and returns nil, which doesn't affect the table to
>         any degree.

Use `Ct(true)`.
The `true` is coerced to be the same as `P(true)`, which is a pattern
that always matches.
Hence it gives you a fresh table every time.