[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: [LPeg] How can I parse a subset of markdown?
- From: "Soni L." <fakedme@...>
- Date: Fri, 22 Jul 2016 12:36:57 -0300
http://stackoverflow.com/q/38514522/3691554
I'm trying to parse a subset of markdown into a tree with LPeg. The idea
is simple but I'm not sure what I'm doing. The whole spec for the thing
I'm doing is here[1] and yes, that's a master branch github link, there
are still some things I need to work out.
So, the basic idea is that I have: (put in a code block because that's
the only thing that does preformatted text/doesn't strip spaces here)
`> ` blocks, where the space is (greedy, non-backtracking) optional, as
in `lpeg.P(">") * lpeg.P(" ")^-1`.
` ` "blocks", behave like in markdown (i.e. everything until the
end of
the line is not interpreted as markdown).
#-###### "blocks", behave like in GFM (i.e. what follows is not
interpreted,
except for inline elements). This is easy, with something
similar to:
--
local header = (lpeg.P("#") * lpeg.P("#")^-5 * lpeg.C(non_eol^1)) /
process_header_elements
--
(it's much easier to use a function capture here than doing it in
pure LPeg.)
triple-` blocks, these are trivial. they're inspired by github
markdown.
single-` "blocks", these are supported as inline elements in hash
blocks.
And I think that describes the whole thing really. My main issue is
combining all the parts together, not the individual parsing of each
part. Then I need to collect it all into a table, which should also be
pretty easy.
(Now that I look at it I see that MDXML is *so* much simpler than
markdown that you can probably parse the whole thing with a single
regex. But regex doesn't let me collect into a table like I want.)
[1]: https://github.com/SoniEx2/MDXML/blob/master/README.md
PS: This post may look like shit, I copypasted it from SO.
--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.