[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: [ANN] SLAXML - pure Lua, robust-ish, SAX-like streaming XML processor
- From: Gavin Kistner <phrogz@...>
- Date: Tue, 19 Feb 2013 21:23:06 -0700
On Feb 19, 2013, at 11:14 AM, firstname.lastname@example.org wrote:
> Basically, they said that when using a SAX parser you almost always want
> to maintain some sort of stack of elements as well as check if the open
> and close tags matched. So what they did was keep an internal stack (like
> the nsStack you have but with more stuff) and expose it to the user via
> extra arguments passed to the handlers and by assigning a meaning to their
> return values.
I could certainly do that. (I do, in fact, do it with the DOM parser that's optionally part of SLAXML.) I may add it to the SAX part in order to be slightly more validating. My original thoughts, however, were to keep the SAX parser as fast and lean as possible.
> (the reason they gave for using the return values is because that lets you
> choose what sort of value gets put into the stack. They also wanted to be
> able to put immutable values such as strings and numbers in the stack)
Interesting; that seems sort of nice, but like more cooperation on the SAX side than I'm used to seeing. Often the point of using a SAX parser is because you *don't* want to build a full DOM, but want to cherry-pick values as they fly by.
> And a minor thing: is that "attribute" callback really needed? Most of the
> SAX stuff I saw just reads the attributes in a list and then passes them
> to the startElement handler.
Towards the same goal of having the SAX parser as lean as possible I originally had the attribute() callback because it truly saw the text indicating the start of an element, fired the callback, and then streamed on through the input looking for attributes, reporting each as soon as it saw it. I mean, what if someone had a pathological element with 10,000 attributes?
But…when I added namespace support I needed to support `<foo xmlns="bar">` having the foo element get the bar namespace. I originally added a namespace() callback which you retroactively had to apply to the previous element, but this was bunk. So now…now I do maintain a list of attributes for the element:
and then spin through them once I've finished with the element open tag:
/me hangs head in shame