Parsing big compressed XML files

lua-l archive

Hello,

I was wondering if you are aware of an efficient way to parse a big compressed XML file with Lua (or LuaJIT).

The files store the wikipedia dumps: http://dumps.wikimedia.org/enwiki/20140304/ (all those files with name pages-meta-history).

Currently we do it in Java, but it takes ages to parse all those files (around 18 hours).

The parsing consists in extracting the timestamp of the each of the versions for a page in the dump.

Suggestions ?

Thanks,

Valerio

Follow-Ups:
- Re: Parsing big compressed XML files, Peng Zhicheng
- Re: Parsing big compressed XML files, Luiz Henrique de Figueiredo
- Re: Parsing big compressed XML files, Petite Abeille