[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Parsing big compressed XML files
- From: Petite Abeille <petite.abeille@...>
- Date: Mon, 7 Apr 2014 00:02:52 +0200
On Apr 4, 2014, at 12:00 AM, Valerio Schiavoni <valerio.schiavoni@gmail.com> wrote:
> 18 hours is the cumulative time for _all_ the files , not 18 hours per file :-)
Aha… makes more sense… ok, so, as of April 4th, there was 161 'pages-meta-history’ files, ranging in size from 80 MB to 31 GB…
Looking at the largest compressed file, it takes a whopping 5 hours to inflate on my consumer grade system:
$ time bzcat < enwiki-20140304-pages-meta-history16.xml-p005043453p005137507.bz2 > /dev/null
real 309m38.471s
user 305m0.095s
sys 1m55.005s
A bit overwhelming for my little setup. I hope you have a big hardware budget :D