[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Most succinct way to parse an HTTP header string
- From: Sean Conner <sean@...>
- Date: Thu, 29 Aug 2013 11:57:55 -0400
It was thus said that the Great Luiz Henrique de Figueiredo once stated:
> > I want to parse a HTTP header string "name:value" pair. In REXX this is
>
> This should work just fine:
>
> local name,value=string.match(line,"(.-):%s*(.-)$")
Actually, that may fail. According to RFC-2616, section 4.2:
Header fields can be extended over multiple lines by preceding each
extra line with at least one SP or HT.
So this is a valid header:
User-Agent: The Wizbang Frobulator 1.2p333
(this is Microsoft Windows compatible. No, really!)
(It also conforms to the Gecko layout engine)
(and WebKit)
Here's the code I use to parse headers [1]:
local lpeg = require "lpeg"
local P = lpeg.P
local S = lpeg.S
local C = lpeg.C
local Cf = lpeg.Cf
local Ct = lpeg.Ct
local Cg = lpeg.Cg
-- -------------------------------------------------------
-- This function will collapse repeated headers into a table,
-- but otherwise, the value will be a string
-- --------------------------------------------------------
local function doset(t,i,v)
if t[i] == nil then
t[i] = v
elseif type(t[i]) == 'table' then
t[i][#t[i]+1] = v
else
t[i] = { t[i] , v }
end
return t
end
local crlf = P"\r"^-1 * P"\n"
local lwsp = S" \t"
local eoh = (crlf * #crlf) + (crlf - (crlf^-1 * lwsp))
local lws = (crlf^-1 * lwsp)^0
local value = C((P(1) - eoh)^0) / function(v)
return v:gsub("[%s%c]+"," ")
end
local name = C((P(1) - (P":" + crlf + lwsp))^1)
local header = Cg(name * ":" * lws * value * eoh)
headers = Cf(Ct("") * header^1,doset) * crlf
Given the following headers:
Host: www.example.net
User-Agent: The Wizbang Frobulator 1.2p333
(this is Microsoft Windows compatible. No, really!)
(It also conforms to the Gecko layout engine)
(and WebKit)
Accept: text/html;q=.9,
text/plain;q=.5,
text/*;q=0
Accept-Charset: iso-8859-5, unicode-1-1;q=0.8
"headers:match(text)" will return a table:
{
['User-Agent'] = "The Wizbang Frobulator 1.2p333 (this is Microsoft Windows compatible. No, really!) (It also conforms to the Gecko layout engine) (and WebKit)",
['Accept'] = "text/html;q=.9, text/plain;q=.5, text/*;q=0",
['Accept-Charset'] = "iso-8859-5, unicode-1-1;q=0.8
}
-spc (man, that real world---it's sooooo messy)
[1] If I'm parsing email, I'll use:
https://github.com/spc476/LPeg-Parsers/blob/master/email.lua