lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



Dear all,

I am trying to write a parser that would process some wikitext--text with wikilinks [[page|(optional alias)]] (but not escaped [[:page]]) inside.

I want it to return the passed string itself (with some alterations), a list of referred pages and the symbol that is likely to be the list separator in the string passed.

I wrote the following grammar:

    wikitext             <- {| { {| ( prefix?  wikilink )+ |} tail } |}
    wikilink              <- unescapedopen page alias? close
    page                  <- { ( !close !pipe . )+ }
    alias                   <- pipe ( !close . )*
    tail                     <- .*
    prefix                 <- ( separator / ( !unescapedopen . ) )+
    open                  <- "[["
    unescapedopen <- open !escape
    close                  <- "]]"
    pipe                   <- "|"
    escape              <- ":"
    separator          <- {:separator: [,;*#] :} space*
    space                <- %s


After applying it (re.match) to the example line "Perhaps, [[Peter|Simon]], or [[Paul]], so they say", I got:

table {
    1 = Perhaps, [[Peter|Simon]], or [[Paul]], so they say
    2 = table {
        1 = Peter
        2 = Paul
        separator = ,
    }
}

This is close to what I want.

However, there are some issues:
1) can I make outer table's indices strings: ['full'] not [1], ['items'] not [2]? I experimented with named group captures but unsuccessfully.
2) can the number of nested captures be reduced?
3) most importantly: I want a string constant (e.g. "Name::") to be inserted after any found <unescapedopen>; and the first capture that returns the whole line should contain this constant: "...[[Name::Paul]]..." not "...[[Paul]]...". This can have something to do with substitution captures; I tried them but couldn't do it. Can it be done at all?

Alexander Mashin