Lua Xml

lua-users home
wiki

The following is some sample code for handling XML. It is divided into two sections. Parsers which are Lua only, and parsers which contain C code and a binding. Credit to the authors is mentioned where appropriate.

Toolkits

LazyKit is a collection of XML processing tools. Its primary purpose is to provoke discussion of XML tools in Lua.

Lua only

Lua XML Parser

LuaXML-0.0.0, From Paul Chakravarti [1] (link broken)

The module implements a non-validating XML stream parser with a handler based event api (conceptually similar to SAX) which can be used to post-process the event data as required (eg into a tree).

The current functionality is -

The limitations are -

The distribution also includes sample event handlers to convert the SAX event stream into a Lua table -

Classic Lua-only version

From: Roberto Ierusalimschy

I have this basic skeleton that parses the "main" part of an XML string (it does not handle meta-data like "<?" and "<!"...). -- Roberto

[!] VersionNotice: The below code pertains to an older Lua version, Lua 4. It does not run as is under Lua 5.

function parseargs (s)
  local arg = {}
  gsub(s, "(%w+)=([\"'])(.-)%2", function (w, _, a)
    %arg[w] = a
  end)
  return arg
end

function collect (s)
  local stack = {n=0}
  local top = {n=0}
  tinsert(stack, top)
  local ni,c,label,args, empty
  local i, j = 1, 1
  while 1 do
    ni,j,c,label,args, empty = strfind(s, "<(%/?)(%w+)(.-)(%/?)>", i)
    if not ni then break end
    local text = strsub(s, i, ni-1)
    if not strfind(text, "^%s*$") then
      tinsert(top, text)
    end
    if empty == "/" then  -- empty element tag
      tinsert(top, {n=0, label=label, args=parseargs(args), empty=1})
    elseif c == "" then   -- start tag
      top = {n=0, label=label, args=parseargs(args)}
      tinsert(stack, top)   -- new level
    else  -- end tag
      local toclose = tremove(stack)  -- remove top
      top = stack[stack.n]
      if stack.n < 1 then
        error("nothing to close with "..label)
      end
      if toclose.label ~= label then
        error("trying to close "..toclose.label.." with "..label)
      end
      tinsert(top, toclose)
    end 
    i = j+1
  end
  local text = strsub(s, i)
  if not strfind(text, "^%s*$") then
    tinsert(stack[stack.n], text)
  end
  if stack.n > 1 then
    error("unclosed "..stack[stack.n].label)
  end
  return stack[1]
end


-- example

x = collect[[
     <methodCall kind="xuxu">
      <methodName>examples.getStateName</methodName>
      <params>
         <param>
            <value><i4>41</i4></value>
            </param>
         </params>
      </methodCall>
]]


updated for 5.1
function parseargs(s)
  local arg = {}
  string.gsub(s, "(%w+)=([\"'])(.-)%2", function (w, _, a)
    arg[w] = a
  end)
  return arg
end
    
function collect(s)
  local stack = {}
  local top = {}
  table.insert(stack, top)
  local ni,c,label,xarg, empty
  local i, j = 1, 1
  while true do
    ni,j,c,label,xarg, empty = string.find(s, "<(%/?)(%w+)(.-)(%/?)>", i)
    if not ni then break end
    local text = string.sub(s, i, ni-1)
    if not string.find(text, "^%s*$") then
      table.insert(top, text)
    end
    if empty == "/" then  -- empty element tag
      table.insert(top, {label=label, xarg=parseargs(xarg), empty=1})
    elseif c == "" then   -- start tag
      top = {label=label, xarg=parseargs(xarg)}
      table.insert(stack, top)   -- new level
    else  -- end tag
      local toclose = table.remove(stack)  -- remove top
      top = stack[#stack]
      if #stack < 1 then
        error("nothing to close with "..label)
      end
      if toclose.label ~= label then
        error("trying to close "..toclose.label.." with "..label)
      end
      table.insert(top, toclose)
    end
    i = j+1
  end
  local text = string.sub(s, i)
  if not string.find(text, "^%s*$") then
    table.insert(stack[stack.n], text)
  end
  if #stack > 1 then
    error("unclosed "..stack[stack.n].label)
  end
  return stack[1]
end

Original version

From: Yutaka Ueno

That is my test program, which is now revised [2] (link broken) But it is only tested for a few XML files used in biology. Probably Roberto's code provides a better skelton than mine, but there is a difference in xml-tag descriptions with Lua tables.

XML :           <methodCall kind="xuxu">
Lua by Roberto: { label="methodCall", args={kind="xuxu"} }
Lua by Ueno:    { xml="methodCall", kind="xuxu" }

Because the property name "xml" never appears in XML. This method is a bit better for terribly deep XML tags proposed in biology.


C-Bindings to XML-parser

Kino

From: Eckhart Koeppen

Well, there is something I programmed which might help, the Kino XML processor. It has wrappers for Tcl and Lua via SWIG. A Xt and an experimental Gtk widget for displaying XML with CSS are also available. Take a look at it at: [3]

It is under constant development but tries to stick to the DOM, so I hope that the interface changes remain small.

LuaGnome

luagnome [4] includes the wrapping of libxml-1.8.x, as it is considered as a part of gnome. It allows to parse and to generate XML files, with a simple api (object oriented).

Expat

For Lua 5.0, use [LuaExpat], which is full-featured.

For Lua 4.0:

From: Jay Carlson

I've put a simple binding of expat, James Clark's C stream-based XML parser up at [5]. No, not everything is bound, but it should be obvious how to bind more stuff to it.

XML-RPC

For Lua 5.0, an initial release of [LuaXMLRPC] is available from [The Kepler Project], although it is not yet bug-free.

For Lua 4.0:

From: Jay Carlson

I've put an initial release of client/server bindings for Lua for XML-RPC at [6]. It contains my lxp expat binding, and uses LuaSocket for client transport.

For more information on XML-RPC, see [7].

Although the packaging and documentation are scant, this package successfully passes the validation tests at [8].

XML-DOM parser

PugXML is a C++ small, fast, non-validating DOM XML parser, contained in a single header, having no dependencies other than the standard C libraries, and <iostream> (KERNEL32.DLL with WIN32). This XML parser segments a given string in situ (like strtok), performing scanning/tokenization, and parsing in a single pass.

Here is an example of the parser use in Lua:

-- create xml_parser object
parser = pug.xml_parser( pug.xml_parser.parse_default, true, 4);

-- parse string
xml_string = '<xml><child>some data </child><child2 attr="value"/></xml>';
print('parsing string: ' .. xml_string );
parser:parse(xml_string, pug.xml_parser.parse_noset);
print( tostring(parser:document()) );

-- Testing xml_node
-- getting root
root=parser:document();

-- add a element child
child=root:append_child( pug.xml_node_type.element );
print( tostring(root) );

-- rename child to child
child:name('child');
print('child name is ' .. child:name() );
print( tostring(root) );

-- adding attributes
child:append_attribute('attribute','value');
child:append_attribute('attribute2','value2');

-- adding on children
child2=child:append_child( pug.xml_node_type.element );
child2:name('child2');
print( tostring(root) );

A wrapper around this parser has been written with [LuaBind] and is available at [9] (broken link). The original article about PugXML is located at [10].

TinyXML

For Lua 5.0 From: Robert Noll

Just a plain "Parse File to lua array" function in c++, using the [TinyXML] (2.4.3) lib

// header

class lua_State;
	
/// register parser functions to lua
void	RegisterLuaXML (lua_State *L);


// sourcefile

#include "tinyxml.h"

extern "C" {
	#include "lua.h"
	#include "lauxlib.h"
	#include "lualib.h"
}

void LuaXML_ParseNode (lua_State *L,TiXmlNode* pNode) { PROFILE
	if (!pNode) return;
	// resize stack if neccessary
	luaL_checkstack(L, 5, "LuaXML_ParseNode : recursion too deep");
	
	TiXmlElement* pElem = pNode->ToElement();
	if (pElem) {
		// element name
		lua_pushstring(L,"name");
		lua_pushstring(L,pElem->Value());
		lua_settable(L,-3);
		
		// parse attributes
		TiXmlAttribute* pAttr = pElem->FirstAttribute();
		if (pAttr) {
			lua_pushstring(L,"attr");
			lua_newtable(L);
			for (;pAttr;pAttr = pAttr->Next()) {
				lua_pushstring(L,pAttr->Name());
				lua_pushstring(L,pAttr->Value());
				lua_settable(L,-3);
				
			}
			lua_settable(L,-3);
		}
	}
	
	// children
	TiXmlNode *pChild = pNode->FirstChild();
	if (pChild) {
		int iChildCount = 0;
		for(;pChild;pChild = pChild->NextSibling()) {
			switch (pChild->Type()) {
				case TiXmlNode::DOCUMENT: break;
				case TiXmlNode::ELEMENT: 
					// normal element, parse recursive
					lua_newtable(L);
					LuaXML_ParseNode(L,pChild);
					lua_rawseti(L,-2,++iChildCount);
				break;
				case TiXmlNode::COMMENT: break;
				case TiXmlNode::TEXT: 
					// plaintext, push raw
					lua_pushstring(L,pChild->Value());
					lua_rawseti(L,-2,++iChildCount);
				break;
				case TiXmlNode::DECLARATION: break;
				case TiXmlNode::UNKNOWN: break;
			};
		}
		lua_pushstring(L,"n");
		lua_pushnumber(L,iChildCount);
		lua_settable(L,-3);
	}
}

static int LuaXML_ParseFile (lua_State *L) { PROFILE
	const char* sFileName = luaL_checkstring(L,1);
	TiXmlDocument doc(sFileName);
	doc.LoadFile();
	lua_newtable(L);
	LuaXML_ParseNode(L,&doc);
	return 1;
}

void	RegisterLuaXML (lua_State *L) {
	lua_register(L,"LuaXML_ParseFile",LuaXML_ParseFile);
}

FindPage · RecentChanges · preferences
edit · history
Last edited March 8, 2008 6:22 pm GMT (diff)