[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: There's a bug in my LPeg code, but I can't find it
- From: Daurnimator <quae@...>
- Date: Sat, 18 Aug 2018 12:12:02 +1000
On 18 August 2018 at 10:33, Sean Conner <sean@conman.org> wrote:
>
> Usually, I'm the one to answer LPeg questions, but tonight I need some
> help with LPeg, and I'm hoping someone might see something I'm missing. It
> has to do with my URL parsing module [1]. The following code presents the
> bug:
>
> url = require "org.conman.parsers.url.url"
> lpeg = require "lpeg"
>
> x = url * lpeg.Cp()
>
> a,b = x:match "/status" print(b) -- prints 8, okay
> a,b = x:match "/status/" print(b) -- prints 9, okay
> a,b = x:match "/status " print(b) -- prints 8, okay
> a,b = x:match "/status/ " print(b) -- prints 8, WAT?
>
> The code in url that matters [2]:
>
> path_absolute <- {| {:root: %istrue :} '/' (segment_nz ('/' segment)* )? |}
> segment_nz <- {~ pchar+ ~}
> segment <- ! . / {~ pchar+ ~} -- NOTE
> pchar <- unreserved / pct_encoded / sub_delims / ':' / '@'
> pct_encoded <- %pct_encoded
> sub_delims <- '!' / '$' / '&' / "'" / '(' / ')'
> / '*' / '+' / ',' / ';' / '='
> unreserved <- %ALPHA / %DIGIT / '-' / '.' / '_' / '~'
>
> The 'segment' rule *should* be
>
> segment <- ! . / {~ pchar* ~}
>
> But fixing that issue doesn't resolve my current issue. Why is the
> trailing slash, when followed by a space, not parsed as part of the URL? I
> can work around the bug (for some usecases; see below for a possibly related
> issue) but it's annoying me that I can't seem to locate the issue.
>
> Possibly related:
>
> a,b = x:match "/status#a" print(b) -- prints 10 okay
> a,b = x:match "/status/#a" print(b) -- prints 8 WAT?
>
> -spc (Puzzled by this ... )
>
> [1] Installable as
> luarocks install org.conman.parsers.url.url
> Also as part of
> https://github.com/spc476/LPeg-Parsers
> viewable at:
> https://github.com/spc476/LPeg-Parsers/blob/9fe3db4c0a52264f9e0e78200cc0f7dda0008f04/url/url.lua
>
> [2] The code is literally transcribed from RFC-3986.
>
Your segment definition is incorrect: you have "either followed by end
of string, or at least one path character"
It should instead be "any number of path characters". i.e.
segment <- {~ pchar* ~}
PS, you should check out my version :)
https://github.com/daurnimator/lpeg_patterns/blob/master/lpeg_patterns/uri.lua