[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Lpeg grammar questions
- From: Simon Cozens <simon@...>
- Date: Wed, 12 Feb 2014 20:03:27 +0900
Hello! I am nearing the end of my first major Lua project but I have got stuck
with Lpeg.
(I have also asked this question on SO but with no response so far. If you
play the SO game and want to get some more points you could answer at
http://stackoverflow.com/questions/21622316/parsing-a-tex-like-language-with-lpeg
...)
I have managed to produce one grammar which does what I want, but I have been
beating my head against this one and not getting far. The idea is to parse a
document which is a simplified form of TeX. I want to split a document into:
* Environments, which are \begin{cmd} and \end{cmd} pairs.
* Commands which can either take an argument like so: \foo{bar} or can be
bare: \foo.
* Both environments and commands can have parameters like so:
\command[color=green,background=blue]{content}.
Other stuff.
I also would like to keep track of line number information for error handling
purposes. Here's what I have so far:
lpeg = require("lpeg")
lpeg.locale(lpeg)
-- Assume a lot of "X = lpeg.X" here.
-- Line number handling from
http://lua-users.org/lists/lua-l/2011-05/msg00607.html
-- with additional print statements to check they are working.
local newline = P"\r"^-1 * "\n" / function (a) print("New"); end
local incrementline = Cg( Cb"linenum" )/ function ( a ) print("NL"); return a
+ 1 end , "linenum"
local setup = Cg ( Cc ( 1) , "linenum" )
nl = newline * incrementline
space = nl + lpeg.space
-- Taken from "Name-value lists" in http://www.inf.puc-rio.br/~roberto/lpeg/
local identifier = (R("AZ") + R("az") + P("_") + R("09"))^1
local sep = lpeg.S(",;") * space^0
local value = (1-lpeg.S(",;]"))^1
local pair = lpeg.Cg(C(identifier) * space ^0 * "=" * space ^0 * C(value)) *
sep^-1
local list = lpeg.Cf(lpeg.Ct("") * pair^0, rawset)
local parameters = (P("[") * list * P("]")) ^-1
-- And the rest is mine
anything = C( (space^1 + (1-lpeg.S("\\{}")) )^1) * Cb("linenum") / function
(a,b) return { text = a, line = b } end
begin_environment = P("\\begin") * Ct(parameters) * P("{") * Cg(identifier,
"environment") * Cb("environment") * P("}") / function (a,b) return { params =
a[1], environment = b } end
end_environment = P("\\end{") * Cg(identifier) * P("}")
texlike = lpeg.P{
"document";
document = setup * V("stuff") * -1,
stuff = Cg(V"environment" + anything + V"bracketed_stuff" + V"command_with"
+ V"command_without")^0,
bracketed_stuff = P"{" * V"stuff" * P"}" / function (a) return a end,
command_with =((P("\\") * Cg(identifier) * Ct(parameters) *
Ct(V"bracketed_stuff"))-P("\\end{")) / function (i,p,n) return { command = i,
parameters = p, nodes = n } end,
command_without = (( P("\\") * Cg(identifier) * Ct(parameters)
)-P("\\end{")) / function (i,p) return { command = i, parameters = p } end,
environment = Cg(begin_environment * Ct(V("stuff")) * end_environment) /
function (b,stuff, e) return { b = b, stuff = stuff, e = e} end
}
It almost works!
> texlike:match("\\foo[one=two]thing\\bar")
{
command = "foo",
parameters = {
{
one = "two",
},
},
}
{
line = 1,
text = "thing",
}
{
command = "bar",
parameters = {
},
}
But! First, I can't get the line number handling part to work at all. The
function within incrementline is never fired.
I also can't quite work out how nested capture information is passed to
handling functions (which is why I have scattered Cg, C and Ct semirandomly
over the grammar). This means that only one item is returned from within a
command_with:
> texlike:match("\\foo{text \\command moretext}")
{
command = "foo",
nodes = {
{
line = 1,
text = "text ",
},
},
parameters = {
},
}
I would also love to be able to check that the environment start and ends
match up but when I tried to do so, my back references from "begin" were not
in scope by the time I got to "end". I don't know where to go from here.