Xml Iter

lua-users home
wiki

xmliter: iterate over XML trees

(This is part of LazyKit.)

This package supplies a number of tools for iterating through child elements of an XmlTree.

xattrpairs(tree)

Return an iterator over the attributes of tree, returning attribute names and values. Note that this only returns keys of type string. (LuaExpat? uses numeric keys to mark attributes that were defaulted from the DTD.)

xmliter.getn(tree)

Counts the children of tree; roughly equivalent to table.getn. This is necessary because table.getn(tree) does not explicitly call for tree.n, instead using rawget(tree, "n"). Fancy tree implementations may need to use a metatable call to find the number of children.

xpairs(tree)

Return an iterator over tree that returns each index and its child. Example:

parent = lazytree.parsestring("<p>a<z>cdef</z>b</p>")

for i,x in xpairs(parent) do
  if type(x) == "string" then
    print("string:", x)
  else
    print("tag:", x.name)
  end
end 

prints:

string:	a
tag:	z
string:	b 

Note that it does not descend into child elements (as "cdef" was not printed).

xnpairs(tree)

Return an iterator over tree that ignores character data elements. It returns an index, subtree, and element name (which may be ignored):

for i,x in xnpairs(parent) do
  print("tag:", x.name)
end

for i,x,name in xnpairs(parent) do
  print("tag:", name)
end 

Either of the above prints:

tag:	z 

Generic filtering

xmliter.switch(parent, ftable, [opts])

Iterate through the children of parent, using function definitions from ftable.

Each child of parent is looked up in ftable. For a child "<foo/>", the function ftable.foo(child, parent) is called. For character data, ftable[""](str, parent) is called. If an unknown tag is found, the function ftable[true](parent, child) is called.

If such an entry in ftable does not exist, the child is ignored (unless certain options are set.)

If the handler returns a true value, switch stops iterating and returns a (possibly different) true value along with any second return value. (Interaction with consumption TBD, and possibly using the first return value as a count of how many levels to escape out of.)

Example:

s = '<log><entry time="12:30"/><checkpoint/><entry time="12:35"/></log>'
parent = lazytree.parsestring(s)
ftable = {
  entry=function (entry, parent)
          print (entry.attr.time)
        end
}
xmliter.switch(parent, ftable) 

prints:

12:30
12:35 

(Note that since we do not care about the parent, the function could have been declared as "function (entry)".)

Entries may contain nested ftables instead of functions; switch (or switch_c) is called recursively with the nested ftable.

Example:

s = [[
<log>
  <entry id='0'>
    <time clock="12:50"/>
    <msg text="foo"/>
    <extra/>
  </entry>
</log>]]
parent = lazytree.parsestring(s)
ftable = {
  entry={
    time=function (time)
           print (time.attr.clock)
         end;
    msg=function (msg)
          print (msg.attr.text)
        end;
  }
}
xmliter.switch(parent, ftable) 

prints:

12:50
foo 

As an aid to use of nested ftables, ftable[0](parent, [previous_parent]) is called before any children are processed, and ftable[-1](parent, [previous_parent]) is called after all children have been processed:

parent = lazytree.parsestring(s)
ftable = {
  entry={
    [0]=function (entry)
      print("id ", entry.attr.id)
      entry.message_txt = "(no message)"
      entry.time_txt = "(no time)"
      entry.level_txt = "(no level)"
    end;
    time=function (time, entry)
      entry.time_txt = time.attr.clock
    end;
    msg=function (msg, entry)
      entry.message_txt = msg.attr.text
    end;
    [-1]=function (entry)
      print("message", entry.message_txt, entry.time_txt, entry.level_txt)
    end;
  }
}
xmliter.switch(parent, ftable) 

prints:

id 	0
message	foo	12:50	(no level) 

This takes advantage of the fact that XML trees do not mind extraneous table entries (as long as you avoid "n", "attr", and "name" and keys starting with an underscore.)

Nested tables may not be the most concise way to express code, however. A simpler way of writing the previous would be:

parent = lazytree.parsestring(s)
ftable = {
  entry=function (entry)
    print("id", entry.attr.id)
    local v = xmlview.element(entry)
    local message_txt = "(no message)"
    local time_txt = "(no time)"
    local level_txt = "(no level)"
    if v.time then time_txt = v.time.attr.clock end
    if v.msg then message_txt = v.msg.attr.text end
    print("message", message_txt, time_txt, level_txt)
 end
}
xmliter.switch(parent, ftable) 

Any use of [0] and [-1] may be rewritten in terms of a function that performs the [0] action, recursively calls switch, and performs the [-1] action.

Recursive searches for elements can be performed by setting the [true] action to the ftable itself. For example:

parent = lazytree.parsefile("xhtml-spec.xml")
local count = 0
local ftable
ftable = {
  a=function (a)
    if a.attr.href then
      count = count + 1
    end
    -- uncomment to search for <a> elements inside other <a> elements
    -- xmliter.switch(a, ftable)
  end
}
ftable[true] = ftable
xmliter.switch(parent, ftable)
print(count) 

(Note that we cannot write "local ftable={... switch(ftable) }" as ftable will not be in scope for itself.)

Options processing

The opts table controls various options for processing.

If opts.no_chardata is set, any unexpected character data (that is, not handled by an ftable[""] entry) results in an error.

If opts.no_tags is set, any unexpected child elements (those not mentioned in ftable or handled by an ftable[true] entry) result in an error.

If opts.parent is set, it is passed to functions as the parent node of the parent argument. This is useful when calling switch recursively if the new ftable contains [0] or [-1] handlers.


RecentChanges · preferences
edit · history
Last edited February 29, 2004 12:30 am GMT (diff)