lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Expat is a very strict parser, so if you want to use HTML entities
(except for mp, lt, gt, apos, and quot) you need to specify a XHTML
doctype such as

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>

in the beginning of your XML string. You can even keep it in a
separate string and concatenate it just before parsing. :-)

--
Fabio Mascarenhas


On Thu, Mar 12, 2009 at 11:07 AM, Elbers, H.P. <H.P.Elbers@boskalis.nl> wrote:
> Hello,
>
> I'm using luaexpat (from the Kepler project) to parse xml files that contain
> entities like &nbsp;
> These cause the parser to abort with the error 'undefined entity'...
>
> Is the a way to tell luaexpat to handle these errors 'gently', or do I have
> to preprocess the input to catch them?
>
> Thanks,
>     Hans.
> ________________________________
> Example:
>
> #!/usr/bin/env lua
> require("lxp")
>
> function doStart(parser, name, attr)  print('start of', name) end
> data = '<aa>Hi@quot;<bb>there</bb><cc>bad&nbsp;luck</cc><dd>Bye</dd><aa>'
> local xml = lxp.new{ StartElement = doStart}
>
> a,b = xml:parse(data)
> if not a then
>   local l,c = xml:pos()
>   print(string.format("ERROR: [%d,%d] %s", l, c, b))
> else
>   xml:close()
> end
>
> Output:
>
> start of        aa
> start of        bb
> start of        cc
> ERROR: [1,34] undefined entity
>