Florian Berger wrote:

Chris Marring wrote:
 > You could just use luaexpat and then extract out what you need. This
 > is especially easy with the Lua Object Model feature, which simply
 > returns the HTML as a hierarchy of tables. Expat is very good at
> grokking all the twisty bits of HTML, so this could help get past all > that...

How well does LuaExpat work if HTML is not clean or valid?

My experience with expat (NOT used with LuaExpat) is that it makes a valiant effort to deal with a few things. But for the most part, invalid HTML generates an error and aborts. I think there is a way to get expat to continue if there is a validity error. For instance, I think you can get it to handle the case where a an EndElement has the wrong name. But for the most part invalid HTML, like invalid C, is hard to fix and make any sense of. And I don't know how easy it would be to get LuaExpat to be tolerant of errors.

My general rule is "always use valid HTML" :-)

