lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


  Seeing how there's going to be a bug fix for LPeg Real Soon Now (TM), I
thought it might be a good time to float a proposal for a new lpeg function. 

  Some background:  I parse a lot of Internet related messages and URLs
(email and SIP messages, sip:, tel:, http: and https: URLs, etc.) and it's
amazing how often name/value pairs keep popping up.  Usually there are a
fixed number of defined name/value pairs but the grammars almost always
allow user defined pairs.  Since I use LPeg for all of my parsing needs, I
like to parse the data into Lua tables and the most problematic part is
handing open ended name/value pairs.

  Let me give a simplified example:  A simple file of name/value pairs
(alpha characters only---I want to keep things really simple) one per line,
name and value separated by an '=' sign; order does not matter.  There are
two fields defined, "foo" and "bar" (which if not provided, default values
will be given).  Two examples follow:

	Example 1:
		foo=de   
		bar=true   
		alpha=Sean
		bravo=Conner

	Example 2:
		yankee=Sean
		zulu=Conner
		foo=se

  I would prefer to return a table like:

	{
	  foo = "de",
	  bar = "true",
	  other =
	  {
	    alpha = "Sean",
	    bravo = "Conner"
	  }
	}

  It is not easy to get that.  If I do (assume everything defined):

	-- for foo abd bar, assume more error checking than you see here
	foo   = P"foo"  * EQ * Cg(value,"foo") * EOL
	bar   = P"bar"  * EQ * Cg(value,"bar") * EOL        
	other = C(name) * EQ * C(value)        * EOL
	list  = Ct( -- CAPTURE INTO  A TABLE
	              Cg(Cc"en","foo")	  -- DEFAULT VALUE
	            * Cg(Cc"false","bar") -- DEFAULT VALUE
	            * (foo + bar + other)^0
	          )

I get:

	{
	  [1] = "alpha",
	  [2] = "Sean",
	  [3] = "bravo",
	  [4] = "Conner",
	  bar = "true",
	  foo = "de"
	}

Nothing at all what I want.  The next solution is to use a folding capture:

	foo   = Cg(C"foo"  * EQ * C(value)) * EOL
	bar   = Cg(C"bar"  * EQ * C(value)) * EOL
	other = Cg(C(name) * EQ * C(value)) * EOL

	list = Cf( -- FOLDING CAPTURE
	           Ct(Cc()) -- SEE [1]
	           * Cg(Cc "foo" * Cc "en")
	           * Cg(Cc "bar" * Cc "false")
	           * (foo + bar + other)^0,
	           function(t,n,v)
	             t[n] = v
	             return t
	           end 
	         )  

It's closer to what I want, and certainly usable:

	{
	  bravo	= "Conner",
	  bar   = "true",
	  alpha	= "Sean",
	  foo	= "de"
	}

and yes, I can complicate the folding function to stuff the non-standard
headers into a sub table, but honestly, I'd rather not do that.

  I *can* get what I want with Carg():

	function set(t,name,val)  t[name] = val end
	function set2(t,name,val) t.other[name] = val end

	foo   = Cg(Carg(1) * C"foo" * EQ * C(value)) / set * EOL
	bar   = Cg(Carg(1) * C"bar" * EQ * C(value)) / set * EOL
	other = Cg(Carg(1) * C(name)  * EQ * C(value)) / set2 * EOL
	list  = Cg(Carg(1) * Cc "foo" * Cc "en")    / set
	      * Cg(Carg(1) * Cc "bar" * Cc "false") / set
	      * Cg(Carg(1) * Ct(Cc())) / function(t,h) t.other = h end
	      * (foo + bar + other)^0
	      * Carg(1)

but at the expense of a more complicated invocation:

	x = list:match(data,1,{})

instead of nicer (to me):

	x = list:match(data)

  Finally, we get to the proposal:  an LPeg function that returns the table
created by Ct(), which I'm calling Ctab() (but I'm not wedded to that
name).  I would work like:

	foo   = P"foo" * EQ * Cg(value,"foo") * EOL
	bar   = P"bar" * EQ * Cg(value,"bar") * EOL
	other = Cg(Ctab() * C(name) * EQ * C(value))
	--	   ^^^^^^ return table created by Ct()
	      / function(t,n,v)
		  t.other[n] = v
	        end
	      * EOL
	list = Ct(
		     Cg(Cc"en","foo")
		   * Cg(Cc"false","bar")
		   * Cg(Ct(Cc()),"other")
		   * (foo + bar + other)^0
		 )

  While I could probably add the function myself, I'd rather not have to
rely upon a custom version of LPeg for parsing modules I write.

  -spc (I'm including working examples of the above)

[1]	Doing a Cc{} don't work here, as that always returns the *same*
	table across different parses (the table is created at compile time
	and returned at runtime).  A bare Ct() fails as it expects a
	pattern; thus, the call Ct(Cc()).  Cc() returns a pattern (satisfies
	Ct()) of nothing and returns nil, which doesn't affect the table to
	any degree.
lpeg = require "lpeg"

Cg = lpeg.Cg
Ct = lpeg.Ct
Cc = lpeg.Cc
C  = lpeg.C
P  = lpeg.P
R  = lpeg.R

function dump(name,t)
  print(name)
  for n,v in pairs(t) do
    print("",n,v)
  end
  print()
end

test1 = [[
foo=de
bar=true
alpha=Sean
bravo=Conner
]]

test2 = [[
yankee=Sean
zulu=Conner
foo=se
]]

ALPHA = R("AZ","az")
EQ    = P"="
EOL   = P"\n"
name  = ALPHA^1
value = ALPHA^1

foo   = P"foo"  * EQ * Cg(value,"foo") * EOL
bar   = P"bar"  * EQ * Cg(value,"bar") * EOL
other = C(name) * EQ * C(value)        * EOL
list  = Ct(
	      Cg(Cc"en","foo")
	    * Cg(Cc"false","bar")
	    * (foo + bar + other)^0
          )

x = list:match(test1)
dump("x",x)

x = list:match(test2)
dump("x",x)
lpeg = require "lpeg"

Ct = lpeg.Ct
Cg = lpeg.Cg
Cf = lpeg.Cf
Cc = lpeg.Cc
C  = lpeg.C
P  = lpeg.P
R  = lpeg.R

function dump(name,t)
  print(name)
  for n,v in pairs(t) do
    print("",n,v)
  end
  print()
end

test1 = [[
foo=de
bar=true
alpha=Sean
bravo=Conner
]]

test2 = [[
yankee=Sean
zulu=Conner
foo=se
]]

ALPHA = R("AZ","az")
EQ    = P"="
EOL   = P"\n"
name  = ALPHA^1
value = ALPHA^1

foo   = Cg(C"foo"  * EQ * C(value)) * EOL
bar   = Cg(C"bar"  * EQ * C(value)) * EOL
other = Cg(C(name) * EQ * C(value)) * EOL

list = Cf(
           Ct(Cc()) 
           * Cg(Cc "foo" * Cc "en")
           * Cg(Cc "bar" * Cc "false")
           * (foo + bar + other)^0,
           function(t,n,v)
             t[n] = v
             return t
           end
         )

x = list:match(test1)
dump("x",x)

x = list:match(test2)
dump("x",x)
lpeg = require "lpeg"

Carg = lpeg.Carg
Cg   = lpeg.Cg
Ct   = lpeg.Ct
Cc   = lpeg.Cc
C    = lpeg.C
P    = lpeg.P
R    = lpeg.R

function dump(name,t)
  print(name)
  for n,v in pairs(t) do
    print("",n,v)
  end
  print()
end

test1 = [[
foo=de
bar=true
alpha=Sean
bravo=Conner
]]

test2 = [[
yankee=Sean
zulu=Conner
foo=se
]]

ALPHA = R("AZ","az")
EQ    = P"="
EOL   = P"\n"
name  = ALPHA^1
value = ALPHA^1

function set(t,name,val)
  t[name] = val
end

function set2(t,name,val)
  t.other[name] = val
end

foo   = Cg(Carg(1) * C"foo" * EQ * C(value)) / set * EOL
bar   = Cg(Carg(1) * C"bar" * EQ * C(value)) / set * EOL
other = Cg(Carg(1) * C(name)  * EQ * C(value)) / set2 * EOL
list  = Cg(Carg(1) * Cc "foo" * Cc "en")    / set
      * Cg(Carg(1) * Cc "bar" * Cc "false") / set
      * Cg(Carg(1) * Ct(Cc())) / function(t,h) t.other = h end
      * (foo + bar + other)^0
      * Carg(1)

x = list:match(test1,1,{})
dump("x",x)

x = list:match(test2,1,{})
dump("x",x)