lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]




On Thursday, February 6, 2014, Marc Lepage <mlepage@antimeta.com> wrote:
On Thu, Feb 6, 2014 at 8:44 PM, Andrew Starks <andrew.starks@trms.com> wrote:


On Thursday, February 6, 2014, Luiz Henrique de Figueiredo <lhf@tecgraf.puc-rio.br> wrote:
My barebones library for parsing XML data into Lua tables based on expat
is available at
        http://www.tecgraf.puc-rio.br/~lhf/ftp/lua/#lxml

It works for both Lua 5.2 and 5.1. Here is interesting part of README:

The library exports just one function, which is returned by require"xml".
This function parses a XML text in a Lua string and returns a Lua table.
Every XML element in the string is represented by a table t. The element's
name is stored in t[0]. The element's attributes are stored as key-value
pairs in t. The element's children are stored in t[1], ..., t[#t]. These
are text and sub-elements, in the order they appear in the XML string.
If an error is found, the parse function returns nil, an error message,
and the position of the error in the string.

test.lua shows the library in action. In particular, it shows how to
remove empty strings consisting of whitespace only and how to simplify
the tables by moving string data to table entries; many but not all XML
files are like that. Neither task is necessary but they can simplify
processing when applicable. In any case, the code in test.lua is just
sample code. Adapt it to your needs.

The code is in the public domain. All feedback is welcome.
Please send comments, suggestions, and bug reports to me directly.


For what it's worth, I love this layout. Using indexes for children makes perfect sense  and so does using a hash for  attributes, given that they are not ordered. I've made a couple of toy XML parsers and I'm kicking myself that I didn't use this design. :)

 
I have used this sort of design a few times and I do like it. Last year I adapted the simple Lua XML parsing code to store this format.


I additionally store (non-whitespace) text in t[-1]. So all string keys are attributes and all 1-N keys are children, same as above.

Marc

My understanding of XML is not total. However, my reading of the spec would suggest that all text data should be stored in a sequence , just as nested tags. That is:

<t> hello <a/> world <b> ! </b> how are you? </t> 

T has five children: hello, a tag, world, b tag, and how are you?

So storing text data in negative keys would seem to make less sense. What am I missing?

Also, I thought white space was significant in generic XML, but could be defined by the schema, although I'm pretty sure I'm making that up...

-Andrew