lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Revisiting my Lua XML-RPC binding, I ran into the mess of representing XML documents as reasonable-looking Lua tables.

What I've been doing is:

  {"tagname", child, child, child, attr={attributename="value"}}

where child can be either a string or another xml node.

Although this allows tables to be easily written (in particular, enforcing the presence of a tag name), it breaks ipairs() for iterating over children.

(Digression 1: why not put the attributes into the table directly, like

  {"tagname", child, child, attributename="value"}

? Because that blows the entire table string namespace on attributes and now we can't start adding useful functions like

  node:findChild("age")

because somebody could reasonably write <z findChild='a'/>. Well, this is the price Lua pays for having only a single namespace in tables. Python avoids this problem by having separate namespaces for "." and "[]"---and then gets into other muddles as a result.)

So I see two reasonable ways to go from here.

The first is to bump everything back one integer.  We then have

  {[0]="tagname", child, child, attr={attributename="value"}}

and now users just have to remember that node[0] is special. The other is to use name:

  {name="tagname", child, child, attr={attributename="value"}}

and then we write node.name.  I'm leaning towards the second.

For some apps it would make sense to add sugary procedures for node creation:

  tagname{child, child, attr={attributename="value"}}

where "tagname" can set appropriate metatables for the node:findChild goodness. For sugar overdose:

  tagname{child, child, attributename="value"}

I consider the second to be pushing my insulin limit, as it separates construction syntax from use syntax (node.attr.attributename). But it might make sense for some communities.

One interesting question is whether internal reps must have fully merged chardata nodes. It would be nice to prohibit

  {name="text", "I am spli", "t up into thr", "ee pieces"}

especially given naive code dealing with UTF-8. External libraries like expat will keep UTF-8 fragments from spanning legitmate stuff like

  {name="text", "I am split into", {name="br"}, "two lines"}

It would be nice to have some agreement here. I don't promise that I'll rewrite the internals of Lua XML-RPC to use this new format (hey, it's working now with Roberto's expat binding) but a convention would be nice.

Meta: Python decides these issues by shipping things in the standard distro. Lua doesn't decide these things. I don't know what's better.

Jay