lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Fri, May 24, 2013 at 06:15:55PM -0700, Wesley Smith wrote:
> > I just finished writing a complete tokenizer in C as an almost direct
> > transliteration of the HTML5 tokenizing rules. I'm confident that it can't
> > be done with LPeg, not if you want to be fully standards compliant and
> > handle pathological cases, such as spammers might abuse.
> 
> I'd be surprised if this was the case.  Do you have a particular
> example in mind?

Just read the specification

	http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html
	http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html

Even excluding JavaScript, many of the state transitions and mid-parsing
node fixups are sufficiently complex that the burden should be on the person
claiming it can be done and--more importantly--how. I'm content with the
conjecture that it can't be done in practice using pure LPeg.