[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Cloning XML
- From: Dirk Laurie <dirk.laurie@...>
- Date: Mon, 18 May 2015 11:41:57 +0200
2015-05-18 8:22 GMT+02:00 Dirk Laurie <dirk.laurie@gmail.com>:
> 2015-05-18 2:24 GMT+02:00 Tim Channon <tc@gpsl.net>:
>> There are many XML decoding libraries of varying degrees of capability and
>> portability. The focus of these is decode.
>>
>> Discussion of regenerating identical XML from the decode seems to get lost
>> and particularly when the user has no idea of the XML meaning, simply wants
>> to intercept textually known entities within a context.
I found it quite easy to regenerate identical XML from the output of Roberto's
parser on the lua-users.org Wiki, but in a very specialized context: the XML
files were those produced by `pdftohtml -xml`, for which the DTD is
self-contained and a mere 49 lines long. A far cry from SVG's over 300 lines
mainly pulling in other files, I'll admit.
Here is what I did: I tweaked Roberto's code to provide a metatable
for every table-valued item.
element_mt = { __tostring =
function(s)
if type(s)=='string' then return s end
assert(s.tag)
local render = render[s.tag]
if type(render)=='string' then return render
elseif type(render)=='function' then return render(s)
else error("Can't convert type '"..s.tag.."' to s string")
end
end }
Then the regenerate routine becomes:
local function assemble(document)
local s = {}
local box = document.first
while box do
s[#s+1] = tostring(box)
box = box.next
end
return tconcat(s,'\n')
end
The global or upvalue "render" is
local render = {
pdf2xml = lines,
page = lines,
document = assemble,
text = contents,
b = function(s) return '**'..contents(s)..'**' end,
i = function(s) return '*'..contents(s)..'*' end,
a = contents,
outline = "<outline>",
fontspec = "<fontspec>" }
with
contents = bind_concat""
lines = bind_concat"\n"
where
function bind_concat(sep)
--- table.concat with bound separator and `tostring` filter
return function(t)
local u={}
if type(t)=='string' then return t end
for k,v in ipairs(t) do u[k]=tostring(v) end
return tconcat(u,sep)
end
end
Note that "render" depends on the DTD. Writing a module that
can generate generate "render" from an arbitrary DTD was not
part of my purposes; writing a different "render" than converts
to say Markdown rather than plain text is, but not yet to the
level where I can share it.