[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: LPeg: parsing text with wikilinks
- From: Александр Машин <alex.mashin@...>
- Date: Mon, 25 May 2015 01:47:13 +0700
Dear all,
I am trying to write a parser that would process some wikitext--text
with wikilinks [[page|(optional alias)]] (but not escaped [[:page]]) inside.
I want it to return the passed string itself (with some alterations), a
list of referred pages and the symbol that is likely to be the list
separator in the string passed.
I wrote the following grammar:
wikitext <- {| { {| ( prefix? wikilink )+ |} tail } |}
wikilink <- unescapedopen page alias? close
page <- { ( !close !pipe . )+ }
alias <- pipe ( !close . )*
tail <- .*
prefix <- ( separator / ( !unescapedopen . ) )+
open <- "[["
unescapedopen <- open !escape
close <- "]]"
pipe <- "|"
escape <- ":"
separator <- {:separator: [,;*#] :} space*
space <- %s
After applying it (re.match) to the example line "Perhaps,
[[Peter|Simon]], or [[Paul]], so they say", I got:
table {
1 = Perhaps, [[Peter|Simon]], or [[Paul]], so they say
2 = table {
1 = Peter
2 = Paul
separator = ,
}
}
This is close to what I want.
However, there are some issues:
1) can I make outer table's indices strings: ['full'] not [1], ['items']
not [2]? I experimented with named group captures but unsuccessfully.
2) can the number of nested captures be reduced?
3) most importantly: I want a string constant (e.g. "Name::") to be
inserted after any found <unescapedopen>; and the first capture that
returns the whole line should contain this constant:
"...[[Name::Paul]]..." not "...[[Paul]]...". This can have something to
do with substitution captures; I tried them but couldn't do it. Can it
be done at all?
Alexander Mashin