lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 4/7/2014 7:03 AM, Valerio Schiavoni wrote:
And for your curiosity, on one of the smaller files, i get
sensible differences between 7z and bz2 :

$time 7z e -so -bd
enwiki-20140304-pages-meta-history8.xml-p000662352p000665000.7z
2>/dev/null > /dev/null
7z e -so -bd
enwiki-20140304-pages-meta-history8.xml-p000662352p000665000.7z
6.10s user 0.02s system 99% cpu 6.120 total

$time bzcat <
enwiki-20140304-pages-meta-history8.xml-p000662352p000665000.bz2 >
/dev/null
bzcat <
enwiki-20140304-pages-meta-history8.xml-p000662352p000665000.bz2 >
   61.26s user 0.14s system 99% cpu 1:01.41 total

It's strange Wikipedia has not moved to xz but is using a mix of 7z and bzip2, even the Linux kernel has moved to tar.xz. Both 7z and xz uses the newer LZMA2.

Unfortunately BWT+Huffman in bzip2 has roughly symmetrical times for compression-decompression. 7z/xz decompression is not symmetrical to compression, and will always be a lot faster than bzip2 at big block sizes. For multiple runs, recompressing the kaboodle to 7z/xz will probably greatly improve your runtimes. Things like lzo is a lot faster but will likely compress to only 50% or so for text data.

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia