lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I tried to send this yesterday.

Jay

---------- Forwarded message ----------
Date: Sun, 11 Jan 2004 22:07:36 -0500
From: Jay Carlson <nop@nop.com>
To: lua@bazar2.conectiva.com.br
Cc: nop@nop.com
Subject: XML representation, again

Revisiting my Lua XML-RPC binding, I ran into the mess of representing
XML documents as reasonable-looking Lua tables.

What I've been doing is:

   {"tagname", child, child, child, attr={attributename="value"}}

where child can be either a string or another xml node.

Although this allows tables to be easily written (in particular,
enforcing the presence of a tag name), it breaks ipairs() for iterating
over children.

(Digression 1: why not put the attributes into the table directly, like

   {"tagname", child, child, attributename="value"}

?  Because that blows the entire table string namespace on attributes
and now we can't start adding useful functions like

   node:findChild("age")

because somebody could reasonably write <z findChild='a'/>.  Well, this
is the price Lua pays for having only a single namespace in tables.
Python avoids this problem by having separate namespaces for "." and
"[]"---and then gets into other muddles as a result.)

So I see two reasonable ways to go from here.

The first is to bump everything back one integer.  We then have

   {[0]="tagname", child, child, attr={attributename="value"}}

and now users just have to remember that node[0] is special.  The other
is to use name:

   {name="tagname", child, child, attr={attributename="value"}}

and then we write node.name.  I'm leaning towards the second.

For some apps it would make sense to add sugary procedures for node
creation:

   tagname{child, child, attr={attributename="value"}}

where "tagname" can set appropriate metatables for the node:findChild
goodness.  For sugar overdose:

   tagname{child, child, attributename="value"}

I consider the second to be pushing my insulin limit, as it separates
construction syntax from use syntax (node.attr.attributename).  But it
might make sense for some communities.

One interesting question is whether internal reps must have fully merged
chardata nodes.  It would be nice to prohibit

   {name="text", "I am spli", "t up into thr", "ee pieces"}

especially given naive code dealing with UTF-8.  External libraries like
expat will keep UTF-8 fragments from spanning legitmate stuff like

   {name="text", "I am split into", {name="br"}, "two lines"}

It would be nice to have some agreement here.  I don't promise that I'll
rewrite the internals of Lua XML-RPC to use this new format (hey, it's
working now with Roberto's expat binding) but a convention would be nice.

Meta: Python decides these issues by shipping things in the standard
distro.  Lua doesn't decide these things.  I don't know what's better.

Jay