lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



I have been tinkering with a native XML parser in Lua for a
while but ended up leaving it in a 'not-quite completed' state
over the last couple of months - the recent flurry of XML
related activity on Lua-L has prompted me to get this working
and put together an initial release at least to save any
possible duplication. This is a bit rushed and not as well
documented as I would like but is relatively robust/compliant
and certainly usable. The module is available at - http://www.passtheaardvark.com/lua/LuaXML-0.0.0.tgz
The module implements a non-validating XML stream parser with a
handler based event api (conceptually similar to SAX) which can
be used to post-process the event data as required (eg into a
tree). The current functionality is -
  * Tokenises well-formed XML (relatively robustly)
  * Flexible handler based event api (
  * Parses/generates events all XML elements - ie.
      - Tags
      - Text
      - Comments
      - CDATA
      - XML Decl
      - Processing Instructions
      - DOCTYPE declarations
  * Provides limited well-formedness checking
    (checks for basic syntax & balanced tags only)
  * Flexible whitespace handling (selectable)
* Entity Handling (selectable) The limitations are -
  * Non-validating
  * No charset handling
  * No namespace support
  * Shallow well-formedness checking only (fails
to detect most semantic errors)
I believe that the parsing code is relatively robust/compliant
and I have run it (informally) against the Oasis XML validation
suite believe that it correctly tokenises all of the test data
at least at a cursory level (some of which is pretty grim!) with
the exception of local entity definitions (which arent currently
parsed).
Layered on top of the parser is the handler api (which is
defined in the distribution) - a number of standard handlers are
included in the distribution including a 'print' handler (which
just dumps an event trace) a 'dom'-like handler which creates a
tree based representation (which can support arbitary XML
content but is a bit unwieldy) and a 'simpleTree' handler which
creates a much more Lua-friendly tree structure (but which has
some limitations particularly when working with mixed-content).
(See xml.lua and handler.lua for more info)
The code is relatively well documented (in the module headers)
and it should be possible to work things out from this and
the test app (textxml.lua) which is also useful for exploring
the event parsing and tree-building handlers.
Jay's expat based parser would probably be a better choice
when available however this seems to be pretty reasonable both
in speed & compliance. It would be good to come up with a
common event & tree based API which would allow these to
be used interchangably however.
Interesting I also had a go at an XML-RPC marshaller &
unmarshaller and got this to the point of working as a CGI
server but this implementation isnt as complete as Jay's
(and quite a lot uglier from looking at the code).
Hopefully this might be useful - I am happy to take
suggestions/comments on how to merge this with Jay's
work if anyone is interested. PaulC