lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great Luiz Henrique de Figueiredo once stated:
> > I want to parse a HTTP header string "name:value" pair. In REXX this is 
> 
> This should work just fine:
> 
> local name,value=string.match(line,"(.-):%s*(.-)$")

  Actually, that may fail.  According to RFC-2616, section 4.2:

	Header fields can be extended over multiple lines by preceding each
	extra line with at least one SP or HT. 

  So this is a valid header:

User-Agent: The Wizbang Frobulator 1.2p333
	(this is Microsoft Windows compatible.  No, really!)
	(It also conforms to the Gecko layout engine)
	(and WebKit)

Here's the code I use to parse headers [1]:

local lpeg = require "lpeg"

local P  = lpeg.P
local S  = lpeg.S
local C  = lpeg.C
local Cf = lpeg.Cf
local Ct = lpeg.Ct
local Cg = lpeg.Cg

-- -------------------------------------------------------
-- This function will collapse repeated headers into a table,
-- but otherwise, the value will be a string
-- --------------------------------------------------------

local function doset(t,i,v)
  if t[i] == nil then
    t[i] = v
  elseif type(t[i]) == 'table' then
    t[i][#t[i]+1] = v
  else
    t[i] = { t[i] , v }
  end
  return t
end

local crlf    = P"\r"^-1 * P"\n"
local lwsp    = S" \t"
local eoh     = (crlf * #crlf) + (crlf - (crlf^-1 * lwsp))
local lws     = (crlf^-1 * lwsp)^0
local value   = C((P(1) - eoh)^0) / function(v)
                                      return v:gsub("[%s%c]+"," ")
                                    end
local name    = C((P(1) - (P":" + crlf + lwsp))^1)
local header  = Cg(name * ":" * lws * value * eoh)
headers       = Cf(Ct("") * header^1,doset) * crlf

Given the following headers:

Host: www.example.net
User-Agent: The Wizbang Frobulator 1.2p333
	(this is Microsoft Windows compatible.  No, really!)
	(It also conforms to the Gecko layout engine)
	(and WebKit)
Accept: text/html;q=.9, 
	text/plain;q=.5,
	text/*;q=0
Accept-Charset: iso-8859-5, unicode-1-1;q=0.8

"headers:match(text)" will return a table:

{
  ['User-Agent']     = "The Wizbang Frobulator 1.2p333 (this is Microsoft Windows compatible.  No, really!)  (It also conforms to the Gecko layout engine) (and WebKit)",
  ['Accept']         = "text/html;q=.9, text/plain;q=.5, text/*;q=0",
  ['Accept-Charset'] = "iso-8859-5, unicode-1-1;q=0.8
}

  -spc (man, that real world---it's sooooo messy)

[1]	If I'm parsing email, I'll use:
	https://github.com/spc476/LPeg-Parsers/blob/master/email.lua