|
We have a big cluster, but that to exploit it for this task we might need some map-reduce, which we don't.OTHO, there are those uber-ly compressed .7z files, orders of magnitude smaller than the bz2. I wonder If I can inflate those files with bzcat as well..I've only found this script which seems to provide a "7zcat" tool:
PATH=${GZIP_BINDIR-
'/bin'
}:$PATH
exec
7z e -so -bd
"$@"
2>
/dev/null
|
cat
It'd be interesting to see if you get better results on your hardware...On Mon, Apr 7, 2014 at 12:02 AM, Petite Abeille <petite.abeille@gmail.com> wrote:
Aha… makes more sense… ok, so, as of April 4th, there was 161 'pages-meta-history’ files, ranging in size from 80 MB to 31 GB…
On Apr 4, 2014, at 12:00 AM, Valerio Schiavoni <valerio.schiavoni@gmail.com> wrote:
> 18 hours is the cumulative time for _all_ the files , not 18 hours per file :-)
Looking at the largest compressed file, it takes a whopping 5 hours to inflate on my consumer grade system:
$ time bzcat < enwiki-20140304-pages-meta-history16.xml-p005043453p005137507.bz2 > /dev/null
real 309m38.471s
user 305m0.095s
sys 1m55.005s
A bit overwhelming for my little setup. I hope you have a big hardware budget :D