[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: [LPeg] How can I parse a subset of markdown?
- From: Sean Conner <sean@...>
- Date: Fri, 22 Jul 2016 19:59:56 -0400
It was thus said that the Great Soni L. once stated:
>
> On 22/07/16 08:13 PM, Sean Conner wrote:
> >It was thus said that the Great Soni L. once stated:
> >>http://stackoverflow.com/q/38514522/3691554
> >>
> >>I'm trying to parse a subset of markdown into a tree with LPeg. The idea
> >>is simple but I'm not sure what I'm doing. The whole spec for the thing
> >>I'm doing is here[1] and yes, that's a master branch github link, there
> >>are still some things I need to work out.
> > I'm not exactly sure how you want the resulting table to look like, but
> >going from this minimal example [1]:
> >
> > #Tag
> > ##Attribute
> > ###Value
> > Content
> Invalid. Instead:
Then I suggest you fix https://github.com/SoniEx2/MDXML/blob/master/README.md
as that's where I got the above.
Second, I think I gave you enough to go on your own. *I* am not terribly
interested in writing the LPeg for this.
> #Tag
> ##Attribute
> ###Value
> > Content
>
> should produce:
>
> { -- document root
> { -- root tag
> [tagname_idx] = "Tag",
> ["Attribute"] = "Value",
> "Content"
> }
> }
Ah, nice that you finally gave an example of the output. So from here, I
could expect:
#book
##edition
###3
> #name
> Programming In Lua
> #ISBN
> 859037985X
To produce:
{
{
tagname_idx = "book",
edition = 3,
{ tagname_idx = "name" , "Programming In Lua" },
{ tagname_idx = "ISBN" , "859037985X" },
}
}
(I will say this---I liked RFC-7049 because it included a TON of encoding
examples)
> >an initial stab at the problem (untested):
> >
> >[code]
> >
> > I opted to store the "tag" as the [0]th element because that's what
> > LuaXML
> >does when parsing XML documents. This should get you going though (other
> >things left as an exercise---what if there's a missing tag? Adding in
> >escape sequences. That odd 'raw' mode I didn't understand. Parsing nested
> >data)
> A missing tag should be an error. A missing attribute value should be an
> error. Raw mode means "disable the parser and treat everything as data"
> like XML's <![CDATA[ ]]>. Note that missing tags can only happen when
> you go in a `> ` block.
If you have any other LPeg questions, I'll be happy to answer them. I'm
not up to writing the code for you though.
> >[1] And I'm wondering why you even want this, when you could just use
> > Lua directly, or JSON, or YAML, or *any number of existing
> > half-documented markup languages masquerading as a "standard"* but
> > I'll take you at face value and not ask WTF?
> >
> >
> I can use this for config files, because it's a clean config file format
> unlike XML, and I can also use this to generate XML documents (e.g.
> XHTML webpages) because I designed it that way.
At work, most of the components are configured using XML, except for the
one component I wrote in Lua. I use Lua as the configuration file for that.
And amazingly enough, the ops group has no problems with it. Neither does
the tester (or the rest of the development members in our department).
I don't really see what's wrong with:
version = "1.0"
encoding = "utf-8"
programmming =
{
languages =
{
{ name = "Lua" , link = "http://www.lua.org/" },
{ name = "Python" , link = "https://www.python.org/" },
},
books =
{
{ name = "Programming in Lua" , edition = 3 , ISBN = "859037985X" },
},
}
as that is *way* more concise than https://raw.githubusercontent.com/SoniEx2/MDXML/master/example.md
and could just as easily be converted to XML. Another issue I see with your
format is the use of repeated ">" to indicate nesting level, and it's the
same issue I have with Python and it's significant whitespace to indicate
nesting level---it makes reorganizing a bit more onerous.
But hey, it's your project---knock yourself out.
-spc (But hey, it's your project---knock yourself out)