lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, Jan 23, 2014 at 7:26 AM, Luiz Henrique de Figueiredo
<lhf@tecgraf.puc-rio.br> wrote:
> I don't do anything at the moment about text data, that is, data
> not inside <...> mainly because I don't really understand the semantics of
> this in XML. If all such strings are to be concatened then it should be simple.

Text data is just another XML node. If you have
<a1>t1<a2>t2</a2>t3</a1> then there are two text nodes: t1, t2, and
t3. The children of a1 are, in order, t1, a2, and t3; a2 has one
child, t2.

Practically speaking you have one text node for every >...< whether
the brackets belong to the same tag or not. There's no concatenation
involved (although such concatenation does end up being a common
operation if you want to strip the markup).

A text node containing only whitespace is still a text node.

A tag like <a></a> with no characters inside is debatable as to
whether it contains an empty text node or no children at all -- the
spec permits either interpretation -- but you don't need to worry
about this in practice since a barebones binding is just going to
expose whatever decisions expat made.

You could probably just get away with representing text nodes as
strings, just like you're representing tags as tables. Just insert the
string into the array.

I caution you to be careful in using the name "kind" the way you are,
as <foo kind='bar' /> is perfectly reasonable XML. You should either
use an attribute name that isn't a legal XML identifier ("#kind"
perhaps) or use an XML namespace dedicated to your binding
("luaexpat:kind"). Unfortunately, I acknowledge the problem that this
means you have to use foo["#kind"] instead of foo.kind. :( (I don't
think there are any characters that are legal in Lua identifiers that
aren't legal in XML names.)

You could also represent nodes as a table of { kind="foo",
attrs={kind="bar"}, children={} } -- and then use kind="#text" or
something to represent the text nodes above -- but this of course
means more complication to the Lua-side code which then doesn't look
as elegant.

/s/ Adam