lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


  Usually, I'm the one to answer LPeg questions, but tonight I need some
help with LPeg, and I'm hoping someone might see something I'm missing.  It
has to do with my URL parsing module [1].  The following code presents the
bug:

url  = require "org.conman.parsers.url.url"
lpeg = require "lpeg"

x = url * lpeg.Cp()

a,b = x:match "/status"   print(b) -- prints 8, okay
a,b = x:match "/status/"  print(b) -- prints 9, okay
a,b = x:match "/status "  print(b) -- prints 8, okay
a,b = x:match "/status/ " print(b) -- prints 8, WAT?

The code in url that matters [2]:

path_absolute   <- {| {:root: %istrue :}   '/' (segment_nz ('/' segment)* )? |}
segment_nz      <- {~ pchar+ ~}
segment         <- ! . / {~ pchar+ ~} -- NOTE
pchar           <-  unreserved / pct_encoded / sub_delims / ':' / '@'
pct_encoded     <- %pct_encoded
sub_delims      <- '!' / '$' / '&' / "'" / '(' / ')'
                /  '*' / '+' / ',' / ';' / '='
unreserved      <- %ALPHA / %DIGIT / '-' / '.' / '_' / '~'

The 'segment' rule *should* be 

segment         <- ! . / {~ pchar* ~}

  But fixing that issue doesn't resolve my current issue.  Why is the
trailing slash, when followed by a space, not parsed as part of the URL?  I
can work around the bug (for some usecases; see below for a possibly related
issue) but it's annoying me that I can't seem to locate the issue.

  Possibly related:

a,b = x:match "/status#a" print(b)  -- prints 10 okay
a,b = x:match "/status/#a" print(b) -- prints 8 WAT?

  -spc (Puzzled by this ... )

[1]	Installable as
		luarocks install org.conman.parsers.url.url
	Also as part of
		https://github.com/spc476/LPeg-Parsers
	viewable at:
		https://github.com/spc476/LPeg-Parsers/blob/9fe3db4c0a52264f9e0e78200cc0f7dda0008f04/url/url.lua

[2]	The code is literally transcribed from RFC-3986.