[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Elegant design for creating error messages in LPEG parser
- From: Sean Conner <sean@...>
- Date: Thu, 4 Apr 2019 00:10:52 -0400
It was thus said that the Great joy mondal once stated:
> Hi Spc,
Hi.
> So essentially what you are saying is the '/' function syntax is just
> syntax sugar ? without having much value to creating a parser ?
Not necessarily.
First off, the documentation for LPEG [1] does document all of LPEG but
like the Lua documentation, it can be terse.
Second, '/' is documented in the Capture subsection, so the result of '/'
is to produce a capture. The expression:
num = lpeg.R"09"^1 / tonumber
will match digits, then those digits are passed to the function tonumber(),
which converts a string to a number. It's this number that is returned. An
example:
num = lpeg.R"09"^1
SP = lpeg.P" "
patt = lpeg.Ct((num * SP^-1)^0)
dump('result',patt:match"1 2 3 4") -- just dumps a table
result =
{
}
num doesn't return any captures, so nothing is captured into the table
returned by lpeg.Ct(). Now, let's capture the output of num (I'm only
changing the rule for num---the rest stays the same, except for the output
which I'm showing):
num = lpeg.C(lpeg.R"09"^1)
result =
{
[1] = "1",
[2] = "2",
[3] = "3",
[4] = "4",
}
This captures the digits as strings. If we wanted to convert these to
numbers, that's when '/' comes in:
num = lpeg.R"09"^1 / tonumber
result =
{
[1] = 1.000000,
[2] = 2.000000,
[3] = 3.000000,
[4] = 4.000000,
}
We now get actual numbers. You *can* do the same thing with lpeg.Cmt():
num = lpeg.Cmt(lpeg.R"09"^1,function(_,position,capture)
return position,tonumber(capture)
end)
result =
{
[1] = 1.000000,
[2] = 2.000000,
[3] = 3.000000,
[4] = 4.000000,
}
but you aren't really buying anything in this example, other than being a
bit more verbose (or explicit).
Here's another example of using '/':
char = lpeg.P"\n" / "\\n"
+ lpeg.P"\t" / "\\t"
+ lpeg.P(1)
safe = lpeg.Cs(char^0)
Here I'm doing a substitution capture on the input string. For each
character in the string, if it's a newline character, replace it with the
escaped version '\n'; the same for the tab character. Here, the newline
character is replaced with a string using the '/' operator. Again, you
could do this with lpeg.Cmt() but it would lose some clarity:
char = lpeg.Cmt(lpeg.P"\n",function(_,position) return position,"\\n" end)
+ lpeg.Cmt(lpeg.P"\t",function(_,position) return position,"\\t" end)
+ lpeg.P(1)
safe = lpeg.Cs(char^0)
So I suppose you could say that '/' is syntatic surgar for lpeg.Cmt(), in
that everything you can do with '/' you can do with lpeg.Cmt(). But I find
using '/' clearer than using lpeg.Cmt(). It's not to say I don't use
lpeg.Cmt(), but only when I need to do some other processing at match time.
> I was stuck trying to use Cb ( back referencing ) and Cg - which are
> confusing.
>
> Then I read that Cb is experimental.
It was at one point, but that doesn't seem to be the case anymore. I
generally use Cg() in conjunction with Ct(); I think I've used Cb() once
when parsing text that had variable delimeters.
-spc
[1] http://www.inf.puc-rio.br/~roberto/lpeg/