lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Sun, Nov 24, 2013 at 10:41 PM, Craig Barnes <craigbarnes85@gmail.com> wrote:
>> [1] http://stevedonovan.github.io/Penlight/api/modules/pl.xml.html#parsehtml>
> Doesn't work for me. Am I doing something wrong?

Nope, it's an actual bug. It was expecting DOCTYPE in caps, which of
course is not how HTML works. Then it parses the well-formed HTML fine
- but I must emphasize, that this is a 'relaxed' mode of a dinky XML
parser and really cannot cope with any badly-formed HTML.  So I can't
recommend it for people who need to deal with the real web.

It coped ok with the Slashdot front page, but that's fairly decent HTML.

The result of the well-formed HTML is the following LOM table:

{
  tag = "html",
  attr = {
    lang = "en"
  },
  {
    tag = "head",
    {
      tag = "meta",
      attr = { charset = "utf-8"  },
      empty = 1,
    },
    {
       tag = "title",
      "Test",
    },
  },
  {
    tag = "body",
    {
      tag = "h1",
      "Test",
    }
  }
}

(Cleaned up from pretty.dump)

steve d.