lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hello,
I was wondering if you are aware of an efficient way to parse a big compressed XML file with Lua (or LuaJIT).
The files store the wikipedia dumps: http://dumps.wikimedia.org/enwiki/20140304/ (all those files with name pages-meta-history).
Currently we do it in Java, but it takes ages to parse all those files (around 18 hours).
The parsing consists in extracting the timestamp of the each of the versions for a page in the dump.

Suggestions ? 

Thanks,
Valerio