lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great joy mondal once stated:
> Hi Spc,

  Hi.

> So essentially what you are saying is the '/' function syntax is just
> syntax sugar ? without having much value to creating a parser ?

  Not necessarily.

  First off, the documentation for LPEG [1] does document all of LPEG but
like the Lua documentation, it can be terse.  

  Second, '/' is documented in the Capture subsection, so the result of '/'
is to produce a capture.  The expression:

	num = lpeg.R"09"^1 / tonumber

will match digits, then those digits are passed to the function tonumber(),
which converts a string to a number.  It's this number that is returned.  An
example:

	num  = lpeg.R"09"^1
	SP   = lpeg.P" "
	patt = lpeg.Ct((num * SP^-1)^0)

	dump('result',patt:match"1 2 3 4") -- just dumps a table
	result =
	{
	}

num doesn't return any captures, so nothing is captured into the table
returned by lpeg.Ct().  Now, let's capture the output of num (I'm only
changing the rule for num---the rest stays the same, except for the output
which I'm showing):

	num = lpeg.C(lpeg.R"09"^1)

	result =
	{
	  [1] = "1",
	  [2] = "2",
	  [3] = "3",
	  [4] = "4",
	}

This captures the digits as strings.  If we wanted to convert these to
numbers, that's when '/' comes in:

	num = lpeg.R"09"^1 / tonumber

	result =
	{
	  [1] = 1.000000,
	  [2] = 2.000000,
	  [3] = 3.000000,
	  [4] = 4.000000,
	}

We now get actual numbers.  You *can* do the same thing with lpeg.Cmt():

	num = lpeg.Cmt(lpeg.R"09"^1,function(_,position,capture)
	  return position,tonumber(capture)
	end)

	result =
	{
	  [1] = 1.000000,
	  [2] = 2.000000,
	  [3] = 3.000000,
	  [4] = 4.000000,
	}

but you aren't really buying anything in this example, other than being a
bit more verbose (or explicit).

  Here's another example of using '/':

	char = lpeg.P"\n" / "\\n"
	     + lpeg.P"\t" / "\\t"
	     + lpeg.P(1)
	safe = lpeg.Cs(char^0)

  Here I'm doing a substitution capture on the input string.  For each
character in the string, if it's a newline character, replace it with the
escaped version '\n'; the same for the tab character.  Here, the newline
character is replaced with a string using the '/' operator.  Again, you
could do this with lpeg.Cmt() but it would lose some clarity:

	char = lpeg.Cmt(lpeg.P"\n",function(_,position) return position,"\\n" end)
             + lpeg.Cmt(lpeg.P"\t",function(_,position) return position,"\\t" end)
	     + lpeg.P(1)
	safe = lpeg.Cs(char^0)

  So I suppose you could say that '/' is syntatic surgar for lpeg.Cmt(), in
that everything you can do with '/' you can do with lpeg.Cmt().  But I find
using '/' clearer than using lpeg.Cmt().  It's not to say I don't use
lpeg.Cmt(), but only when I need to do some other processing at match time.

> I was stuck trying to use Cb ( back referencing ) and Cg - which are
> confusing.
> 
> Then I read that Cb is experimental.

  It was at one point, but that doesn't seem to be the case anymore.  I
generally use Cg() in conjunction with Ct(); I think I've used Cb() once
when parsing text that had variable delimeters.

  -spc

[1]	http://www.inf.puc-rio.br/~roberto/lpeg/