lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


We have a big cluster, but that to exploit it for this task we might need some map-reduce, which we don't. 
OTHO, there are those uber-ly compressed .7z files,  orders of magnitude smaller than the bz2. I wonder If I can inflate those files with bzcat as well..

I've only found this script which seems to provide a "7zcat" tool:
http://pseudoscripter.wordpress.com/2011/03/25/writing-7zcat-works-like-zcat-on-7z-files/

PATH=${GZIP_BINDIR-'/bin'}:$PATH
exec 7z e -so -bd "$@" 2>/dev/null | cat  


It'd be interesting to see if you get better results on your hardware... 

On Mon, Apr 7, 2014 at 12:02 AM, Petite Abeille <petite.abeille@gmail.com> wrote:

On Apr 4, 2014, at 12:00 AM, Valerio Schiavoni <valerio.schiavoni@gmail.com> wrote:

> 18 hours is the cumulative time for _all_ the files , not 18 hours per file :-)

Aha… makes more sense… ok, so, as of April 4th, there was 161 'pages-meta-history’ files, ranging in size from 80 MB to 31 GB…

Looking at the largest compressed file, it takes a whopping 5 hours to inflate on my consumer grade system:

$ time bzcat < enwiki-20140304-pages-meta-history16.xml-p005043453p005137507.bz2 > /dev/null

real    309m38.471s
user    305m0.095s
sys     1m55.005s

A bit overwhelming for my little setup. I hope you have a big hardware budget :D