lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 18 August 2018 at 10:33, Sean Conner <sean@conman.org> wrote:
>
>   Usually, I'm the one to answer LPeg questions, but tonight I need some
> help with LPeg, and I'm hoping someone might see something I'm missing.  It
> has to do with my URL parsing module [1].  The following code presents the
> bug:
>
> url  = require "org.conman.parsers.url.url"
> lpeg = require "lpeg"
>
> x = url * lpeg.Cp()
>
> a,b = x:match "/status"   print(b) -- prints 8, okay
> a,b = x:match "/status/"  print(b) -- prints 9, okay
> a,b = x:match "/status "  print(b) -- prints 8, okay
> a,b = x:match "/status/ " print(b) -- prints 8, WAT?
>
> The code in url that matters [2]:
>
> path_absolute   <- {| {:root: %istrue :}   '/' (segment_nz ('/' segment)* )? |}
> segment_nz      <- {~ pchar+ ~}
> segment         <- ! . / {~ pchar+ ~} -- NOTE
> pchar           <-  unreserved / pct_encoded / sub_delims / ':' / '@'
> pct_encoded     <- %pct_encoded
> sub_delims      <- '!' / '$' / '&' / "'" / '(' / ')'
>                 /  '*' / '+' / ',' / ';' / '='
> unreserved      <- %ALPHA / %DIGIT / '-' / '.' / '_' / '~'
>
> The 'segment' rule *should* be
>
> segment         <- ! . / {~ pchar* ~}
>
>   But fixing that issue doesn't resolve my current issue.  Why is the
> trailing slash, when followed by a space, not parsed as part of the URL?  I
> can work around the bug (for some usecases; see below for a possibly related
> issue) but it's annoying me that I can't seem to locate the issue.
>
>   Possibly related:
>
> a,b = x:match "/status#a" print(b)  -- prints 10 okay
> a,b = x:match "/status/#a" print(b) -- prints 8 WAT?
>
>   -spc (Puzzled by this ... )
>
> [1]     Installable as
>                 luarocks install org.conman.parsers.url.url
>         Also as part of
>                 https://github.com/spc476/LPeg-Parsers
>         viewable at:
>                 https://github.com/spc476/LPeg-Parsers/blob/9fe3db4c0a52264f9e0e78200cc0f7dda0008f04/url/url.lua
>
> [2]     The code is literally transcribed from RFC-3986.
>

Your segment definition is incorrect: you have "either followed by end
of string, or at least one path character"
It should instead be "any number of path characters". i.e.

segment         <- {~ pchar* ~}


PS, you should check out my version :)
https://github.com/daurnimator/lpeg_patterns/blob/master/lpeg_patterns/uri.lua