lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Wed, Feb 1, 2012 at 6:21 PM, Axel Kittenberger <axkibe@gmail.com> wrote:
> nk  it's a shame MediaWiki didn't ever pick parse->expand->serialize
> as their infrastructure project instead of taping Lua on the side--and
>> adding Lua as a decorator to a structured document is really easy.
>
> I once startet to write a wiki parser that works not by regexps like
> most do, but by handcrafting token building / grammar trees just like
> code compilers work. I was soon been bugged down by the fact that
> there is no such thing as a syntax violation in wiki markup,

...welcome to HTML. :-)

As I understand it HTML5 mostly nails down the precise bug-compatible
tokenizing and parsing everybody already implements. But MediaWiki
doesn't have the "can't change IE6" problem; they are the only
significant reader of mediawiki markup. They can deprecate the worst
of the ambiguous constructs, and/or assign them a more rational
meaning. If they really want to be radical they can parse-on-save and
tell people to fix their unclosed parens before accepting.

stackoverflow points me at https://github.com/JGM/PEG-Markdown/ for
Markdown syntax at least. I think there are tractable parsers for Ward
Cunningham's Wiki syntax but as you say they all start to look like
projects in compiler automatic syntax error repair.

> and you
> always have the parser to go backtracking and take another way to
> repair the syntax violation, of prior assumption (like a opening tag
> not being closed or so). I still think its doable, but its quite more
> difficult than writing a parser for a computer language.

I know what you mean, but technically speaking it *is* a language even
if it's the set of all ASCII strings--I mean, what text is rejected?
Nobody can figure out where it would sit in the formal language
hierarchy. It just was not designed to be interpreted through the
lex/parse-chain computer languages since the late?-1970s have used.
IMO it was not designed at all. Nobody knows how any two obscure
features interact; the definition is "try it and see what the single
implementation does". :-(

HTML was the big one, but I find the mirage of meaning in MediaWiki
syntax quite depressing given the Wikipedia context. But I'm back to
shouting at clouds. Perhaps the influx of people searching lua-l for
"mediawiki" will find this message in a bottle. (And flame me.)

The tenuous connection to Lua-the-language is that noise words like
"then" and "[for ...] do" exist specifically to reject programs, as
the ergonomics of the language without them leads to bugs rather than
syntax errors, and the cost of the bugs is greater than not having to
type or read "then".

Jay