Lua Xml |
|
Credit to the authors is mentioned where appropriate.
LazyKit is a collection of XML processing tools. Its primary purpose is to provoke discussion of XML tools in Lua.
PenlightLibraries provides an XML module [See Docs] which uses the LOM defined by LuaExpat? and provides pretty-printing, template matching and Orbit-style 'htmlfication'. It will use LuaExpat? if available, otherwise falls back on a pure Lua parser based on Roberto's (see below).
[xml2lua] is an updated version that works for Lua 5.0 to 5.3. It was based on [LuaXML for Lua 4] by Paul Chakravarti.
The module implements a non-validating XML stream parser with a handler based event API (conceptually similar to SAX) which can be used to post-process the event data as required (eg into a tree).
The current functionality is -
The limitations are -
The distribution also includes sample event handlers to convert the SAX event stream into a Lua table -
Another pure-Lua non-validating SAX-like streaming processor. It also includes an implementation of a simple DOM parser (parse to hierarchy of tables).
https://github.com/Phrogz/SLAXML
Features:
cond="7 > 5" (but also incorrectly "supports" certain invalid XML, such as <foo></bar> or <a>5 < 7)
< > & " ') and numeric entities (e.g. ) in attributes and text nodes (but—properly—not in comments or CDATA). Properly handles edge cases like &&.
local SLAXML = require 'slaxml' to use it.
From: Roberto Ierusalimschy
I have this basic skeleton that parses the "main" part of an XML string (it
does not handle meta-data like "<?" and "<!"...).
-- Roberto
function parseargs (s) local arg = {} gsub(s, "(%w+)=([\"'])(.-)%2", function (w, _, a) %arg[w] = a end) return arg end function collect (s) local stack = {n=0} local top = {n=0} tinsert(stack, top) local ni,c,label,args, empty local i, j = 1, 1 while 1 do ni,j,c,label,args, empty = strfind(s, "<(%/?)([%w:]+)(.-)(%/?)>", i) if not ni then break end local text = strsub(s, i, ni-1) if not strfind(text, "^%s*$") then tinsert(top, text) end if empty == "/" then -- empty element tag tinsert(top, {n=0, label=label, args=parseargs(args), empty=1}) elseif c == "" then -- start tag top = {n=0, label=label, args=parseargs(args)} tinsert(stack, top) -- new level else -- end tag local toclose = tremove(stack) -- remove top top = stack[stack.n] if stack.n < 1 then error("nothing to close with "..label) end if toclose.label ~= label then error("trying to close "..toclose.label.." with "..label) end tinsert(top, toclose) end i = j+1 end local text = strsub(s, i) if not strfind(text, "^%s*$") then tinsert(stack[stack.n], text) end if stack.n > 1 then error("unclosed "..stack[stack.n].label) end return stack[1] end -- example x = collect[[ <methodCall kind="xuxu"> <methodName>examples.getStateName</methodName> <params> <param> <value><i4>41</i4></value> </param> </params> </methodCall> ]]
function parseargs(s) local arg = {} string.gsub(s, "([%-%w]+)=([\"'])(.-)%2", function (w, _, a) arg[w] = a end) return arg end function collect(s) local stack = {} local top = {} table.insert(stack, top) local ni,c,label,xarg, empty local i, j = 1, 1 while true do ni,j,c,label,xarg, empty = string.find(s, "<(%/?)([%w:]+)(.-)(%/?)>", i) if not ni then break end local text = string.sub(s, i, ni-1) if not string.find(text, "^%s*$") then table.insert(top, text) end if empty == "/" then -- empty element tag table.insert(top, {label=label, xarg=parseargs(xarg), empty=1}) elseif c == "" then -- start tag top = {label=label, xarg=parseargs(xarg)} table.insert(stack, top) -- new level else -- end tag local toclose = table.remove(stack) -- remove top top = stack[#stack] if #stack < 1 then error("nothing to close with "..label) end if toclose.label ~= label then error("trying to close "..toclose.label.." with "..label) end table.insert(top, toclose) end i = j+1 end local text = string.sub(s, i) if not string.find(text, "^%s*$") then table.insert(stack[#stack], text) end if #stack > 1 then error("unclosed "..stack[#stack].label) end return stack[1] end
From: Yutaka Ueno
That is my test program, which is now revised [1] (link broken) But it is only tested for a few XML files used in biology. Probably Roberto's code provides a better skelton than mine, but there is a difference in xml-tag descriptions with Lua tables.
XML : <methodCall kind="xuxu">
Lua by Roberto: { label="methodCall", args={kind="xuxu"} }
Lua by Ueno: { xml="methodCall", kind="xuxu" }
Because the property name "xml" never appears in XML. This method is a bit better for terribly deep XML tags proposed in biology.
From: Eckhart Koeppen
Well, there is something I programmed which might help, the Kino XML processor. It has wrappers for Tcl and Lua via SWIG. A Xt and an experimental Gtk widget for displaying XML with CSS are also available. Take a look at it at: [2] (link broken)
It is under constant development but tries to stick to the DOM, so I hope that the interface changes remain small.
[luagnome] (link broken, use [3]) includes the wrapping of libxml-1.8.x, as it is considered as a part of Gnome. It allows to parse and to generate XML files, with a simple api (object oriented).
[lua-xmlreader] is an implementation of the XmlReader API using libxml2.
For Lua 5.0/5.1, use [LuaExpat], which is full-featured.
For Lua 4.0:
From: Jay Carlson
I've put a simple binding of expat, James Clark's C stream-based XML parser up at [4]. No, not everything is bound, but it should be obvious how to bind more stuff to it.
[LuaXML] is a lean yet complete module for the direct mapping between XML data and Lua tables.
PugXML is a C++ small, fast, non-validating DOM XML parser, contained in a single header, having no dependencies other than the standard C libraries, and <iostream> (KERNEL32.DLL with WIN32). This XML parser segments a given string in situ (like strtok), performing scanning/tokenization, and parsing in a single pass.
Here is an example of the parser use in Lua:
-- create xml_parser object parser = pug.xml_parser( pug.xml_parser.parse_default, true, 4); -- parse string xml_string = '<xml><child>some data </child><child2 attr="value"/></xml>'; print('parsing string: ' .. xml_string ); parser:parse(xml_string, pug.xml_parser.parse_noset); print( tostring(parser:document()) ); -- Testing xml_node -- getting root root=parser:document(); -- add a element child child=root:append_child( pug.xml_node_type.element ); print( tostring(root) ); -- rename child to child child:name('child'); print('child name is ' .. child:name() ); print( tostring(root) ); -- adding attributes child:append_attribute('attribute','value'); child:append_attribute('attribute2','value2'); -- adding on children child2=child:append_child( pug.xml_node_type.element ); child2:name('child2'); print( tostring(root) );
A wrapper around this parser has been written with [LuaBind] and is available at [5] (link broken). The original article about PugXML is located at [6].
For Lua 5.0:
From: Robert Noll
Just a plain "Parse File to lua array" function in c++, using the [TinyXML] (2.4.3) lib.
// header
class lua_State;
/// register parser functions to lua
void RegisterLuaXML (lua_State *L);
// sourcefile
#include "tinyxml.h"
extern "C" {
#include "lua.h"
#include "lauxlib.h"
#include "lualib.h"
}
void LuaXML_ParseNode (lua_State *L,TiXmlNode* pNode) { PROFILE
if (!pNode) return;
// resize stack if neccessary
luaL_checkstack(L, 5, "LuaXML_ParseNode : recursion too deep");
TiXmlElement* pElem = pNode->ToElement();
if (pElem) {
// element name
lua_pushstring(L,"name");
lua_pushstring(L,pElem->Value());
lua_settable(L,-3);
// parse attributes
TiXmlAttribute* pAttr = pElem->FirstAttribute();
if (pAttr) {
lua_pushstring(L,"attr");
lua_newtable(L);
for (;pAttr;pAttr = pAttr->Next()) {
lua_pushstring(L,pAttr->Name());
lua_pushstring(L,pAttr->Value());
lua_settable(L,-3);
}
lua_settable(L,-3);
}
}
// children
TiXmlNode *pChild = pNode->FirstChild();
if (pChild) {
int iChildCount = 0;
for(;pChild;pChild = pChild->NextSibling()) {
switch (pChild->Type()) {
case TiXmlNode::DOCUMENT: break;
case TiXmlNode::ELEMENT:
// normal element, parse recursive
lua_newtable(L);
LuaXML_ParseNode(L,pChild);
lua_rawseti(L,-2,++iChildCount);
break;
case TiXmlNode::COMMENT: break;
case TiXmlNode::TEXT:
// plaintext, push raw
lua_pushstring(L,pChild->Value());
lua_rawseti(L,-2,++iChildCount);
break;
case TiXmlNode::DECLARATION: break;
case TiXmlNode::UNKNOWN: break;
};
}
lua_pushstring(L,"n");
lua_pushnumber(L,iChildCount);
lua_settable(L,-3);
}
}
static int LuaXML_ParseFile (lua_State *L) { PROFILE
const char* sFileName = luaL_checkstring(L,1);
TiXmlDocument doc(sFileName);
doc.LoadFile();
lua_newtable(L);
LuaXML_ParseNode(L,&doc);
return 1;
}
void RegisterLuaXML (lua_State *L) {
lua_register(L,"LuaXML_ParseFile",LuaXML_ParseFile);
}
https://github.com/d-led/pugilua
Example:
require 'pugilua' ---- reading ---- local doc=pugi.xml_document() local res=doc:load_file [[..\..\scripts\pugilua\pugilua.vcxproj]] print(res.description) local node1=doc:root():child('Project') local query1=doc:root():select_nodes('Project/PropertyGroup') local n=query1.size for i=0,n-1 do local node=query1:get(i):node() local attribute=query1:get(i):attribute() print(node.valid,node.path) local a=node:first_attribute() while a.valid do print(a.name) a=a:next_attribute() end end ---- creating ---- doc:reset() --- from the tutorial -- add node with some name local node = doc:root():append_child("node") -- add description node with text child local descr = node:append_child("description") descr:append(pugi.node_pcdata):set_value("Simple node") -- add param node before the description local param = node:insert_child_before("param", descr) -- add attributes to param node param:append_attribute("name"):set_value("version") param:append_attribute("value"):set_value(1.1) param:insert_attribute_after("type", param:attribute("name")):set_value("float") doc:save_file("tutorial.xml")
As a supplement to pugilua there's an effort to provide a minimal binding to [xerces.apache.org/xerces-c/ Xerces-C++] to be able to validate xml documents:
https://github.com/d-led/xerceslua
assert(require 'xerceslua')
Example:
local parser=xerces.XercesDOMParser() parser:loadGrammar("Employee.dtd",xerces.GrammarType.DTDGrammarType) parser:setValidationScheme(xerces.ValSchemes.Val_Auto) local log=parser:parse("Employeexy.xml") print(log.Ok) if not log.Ok then print(log.Count) for i=0,log.Count-1 do local err=log:GetLogEntry(i) print(err.SystemId..', l:'..err.LineNumber..', c:'..err.ColumnNumber..', e:'..err.Message,err.LogType) end end
For Lua 5.0/5.1, use [LuaXMLRPC] library developed by [The Kepler Project].
For Lua 4.0:
From: Jay Carlson
I've put an initial release of client/server bindings for Lua for XML-RPC at [7]. It contains my lxp expat binding, and uses LuaSocket for client transport.
For more information on XML-RPC, see [8].
Although the packaging and documentation are scant, this package successfully passes the validation tests at [9].
[LuaSOAP] is a Lua library to ease the use of SOAP.
For Lua 5.1:
From: Alexander Makeev
This XmlParser allows build object like C# XmlDocument? with XmlNodes?. See example for details.
----------------------------------------------------------------------------------------- -- LUA only XmlParser from Alexander Makeev ----------------------------------------------------------------------------------------- XmlParser = {}; function XmlParser:ToXmlString(value) value = string.gsub (value, "&", "&"); -- '&' -> "&" value = string.gsub (value, "<", "<"); -- '<' -> "<" value = string.gsub (value, ">", ">"); -- '>' -> ">" --value = string.gsub (value, "'", "'"); -- '\'' -> "'" value = string.gsub (value, "\"", """); -- '"' -> """ -- replace non printable char -> "
" value = string.gsub(value, "([^%w%&%;%p%\t% ])", function (c) return string.format("&#x%X;", string.byte(c)) --return string.format("&#x%02X;", string.byte(c)) --return string.format("&#%02d;", string.byte(c)) end); return value; end function XmlParser:FromXmlString(value) value = string.gsub(value, "&#x([%x]+)%;", function(h) return string.char(tonumber(h,16)) end); value = string.gsub(value, "&#([0-9]+)%;", function(h) return string.char(tonumber(h,10)) end); value = string.gsub (value, """, "\""); value = string.gsub (value, "'", "'"); value = string.gsub (value, ">", ">"); value = string.gsub (value, "<", "<"); value = string.gsub (value, "&", "&"); return value; end function XmlParser:ParseArgs(s) local arg = {} string.gsub(s, "(%w+)=([\"'])(.-)%2", function (w, _, a) arg[w] = self:FromXmlString(a); end) return arg end function XmlParser:ParseXmlText(xmlText) local stack = {} local top = {Name=nil,Value=nil,Attributes={},ChildNodes={}} table.insert(stack, top) local ni,c,label,xarg, empty local i, j = 1, 1 while true do ni,j,c,label,xarg, empty = string.find(xmlText, "<(%/?)([%w:]+)(.-)(%/?)>", i) if not ni then break end local text = string.sub(xmlText, i, ni-1); if not string.find(text, "^%s*$") then top.Value=(top.Value or "")..self:FromXmlString(text); end if empty == "/" then -- empty element tag table.insert(top.ChildNodes, {Name=label,Value=nil,Attributes=self:ParseArgs(xarg),ChildNodes={}}) elseif c == "" then -- start tag top = {Name=label, Value=nil, Attributes=self:ParseArgs(xarg), ChildNodes={}} table.insert(stack, top) -- new level --log("openTag ="..top.Name); else -- end tag local toclose = table.remove(stack) -- remove top --log("closeTag="..toclose.Name); top = stack[#stack] if #stack < 1 then error("XmlParser: nothing to close with "..label) end if toclose.Name ~= label then error("XmlParser: trying to close "..toclose.Name.." with "..label) end table.insert(top.ChildNodes, toclose) end i = j+1 end local text = string.sub(xmlText, i); if not string.find(text, "^%s*$") then stack[#stack].Value=(stack[#stack].Value or "")..self:FromXmlString(text); end if #stack > 1 then error("XmlParser: unclosed "..stack[stack.n].Name) end return stack[1].ChildNodes[1]; end function XmlParser:ParseXmlFile(xmlFileName) local hFile,err = io.open(xmlFileName,"r"); if (not err) then local xmlText=hFile:read("*a"); -- read file content io.close(hFile); return self:ParseXmlText(xmlText),nil; else return nil,err; end end ------------------------------------------------------------------------------------------
example:
function dump(_class, no_func, depth) if(not _class) then log("nil"); return; end if(depth==nil) then depth=0; end local str=""; for n=0,depth,1 do str=str.."\t"; end log(str.."["..type(_class).."]"); log(str.."{"); for i,field in pairs(_class) do if(type(field)=="table") then log(str.."\t"..tostring(i).." ="); dump(field, no_func, depth+1); else if(type(field)=="number") then log(str.."\t"..tostring(i).."="..field); elseif(type(field) == "string") then log(str.."\t"..tostring(i).."=".."\""..field.."\""); elseif(type(field) == "boolean") then log(str.."\t"..tostring(i).."=".."\""..tostring(field).."\""); else if(not no_func)then if(type(field)=="function")then log(str.."\t"..tostring(i).."()"); else log(str.."\t"..tostring(i).."<userdata=["..type(field).."]>"); end end end end end log(str.."}"); end --local obj,err = XmlParser:ParseXmlFile("test.xml"); --if(not err) then -- dump(obj); --else -- log("ERROR: "..err); --end local xmlTree=XmlParser:ParseXmlText([[<?xml version="1.0" encoding="utf-8"?> <Config> <EntityList> <Entity value="1"2"3">innerText</Entity> <Entity value="456"/> </EntityList> </Config> ]]) for i,xmlNode in pairs(xmlTree.ChildNodes) do if(xmlNode.Name=="EntityList") then for i,subXmlNode in pairs(xmlNode.ChildNodes) do if(subXmlNode.Name=="Entity") then log("Entity value=\""..subXmlNode.Attributes.value.."\""); if(subXmlNode.Value) then log(" Content=\""..subXmlNode.Value.."\""); end end end end end dump(xmlTree)
result:
<log>Entity value="1"2"3"
<log> Content="innerText"
<log>Entity value="456"
<log> [table]
<log> {
<log> Attributes =
<log> [table]
<log> {
<log> }
<log> Name="Config"
<log> ChildNodes =
<log> [table]
<log> {
<log> 1 =
<log> [table]
<log> {
<log> Attributes =
<log> [table]
<log> {
<log> }
<log> Name="EntityList"
<log> ChildNodes =
<log> [table]
<log> {
<log> 1 =
<log> [table]
<log> {
<log> Value="innerText"
<log> Attributes =
<log> [table]
<log> {
<log> value="1"2"3"
<log> }
<log> Name="Entity"
<log> ChildNodes =
<log> [table]
<log> {
<log> }
<log> }
<log> 2 =
<log> [table]
<log> {
<log> Attributes =
<log> [table]
<log> {
<log> value="456"
<log> }
<log> Name="Entity"
<log> ChildNodes =
<log> [table]
<log> {
<log> }
<log> }
<log> }
<log> }
<log> }
<log> }