lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On 7-Mar-07, at 6:59 PM, Javier Guerra wrote:

On Wednesday 07 March 2007, Greg McCreath wrote:
I do like the element.sub-element.sub-element.@attribue-name type syntax of E4X and it's not too far away from lua syntax is it.  Javascript also uses
associative arrays a lot.

that seems nice, until you get some XMLs with lots of text content intermixed with tags (think a HTML paragraph with <span>s in it), or an element with several subelements with the same tagname (a table row with several fields,
or a database result, etc).

One hack I've used is to set string.TagType = "#TEXT" (or whatever).
Then  ("foo").TagType is, umm, "expected" and you don't need special
processing in some cases. Clearly, you can only process children
of a node sequentially, but you can filter on TagType at least.

Token filtering would let you turn @foo into a valid id token,
which can be handy; I've tried that to allow node.@foo as an
alternative to node.attr.foo. Token filtering could also expand
@foo into the full attr.foo thing, but the goal would be to
reduce the complexity of the node datatype to something with only
integer keys (children) and strings starting "@" (attributes),
leaving normal strings free for methods and instance variables.
The key here is that the parser doesn't care that an id actually
conform to the lexical description of an id, so token filtering
can turn any character sequence into an id.
(You can also do that trivially by patching llex.c, of course :)

That doesn't deal with colons in attributes; dealing correctly
with namespaces is a pain, since to do things right you need to
normalize the namespace into a url or whatever, which is
extremely unfriendly. One possibility might be to declare
canonical prefix mappings (that is, url->namespace) and munge
the attributes and tagnames after they're normalized to urls.
Unfortunately, using ':' in Lua for namespacing is not really
possible, but again token filtering would allow the use of ::,
for example, which might not be too bad.

The biggest annoyance with the DOM model, for me anyway, is
that it insists on parent links, so that you can't share
children; inserting a child means unlinking it from wherever
it used to be (and if that involves table.remove(), it can
get expensive). That might be intrinsic to the "standard
semantics" of XML, since any node can have an id, which must
be unique, thereby making the node unsharable. This leads to
the peculiarity of XML fragments (lists of nodes) which
empty themselves when you insert them into an XML tree.

Of course, the other annoyance is that an XML node might
either be styled text or it might be structured data (or
even some wierd combination of both), and the semantics
of the two cases seem quite different, at least to me.