Skip to content

xz: Your next default compression tool

The xz compression program, run with defaults, can be so slow that you might wonder why anybody uses it. But on a modern machine, xz -0 --threads=0 will run with less aggressive compression goals and use all available threads on a multi-core CPU, exceeding gzip performance by a good measure. One nice bonus with the xz format is that individual archives can be concatenated. cat file*.xz > all-files.xz without decompressing first, and then unarchive in one step – nifty! You can even null-pad the space between concatenated files if you need to block-align the archive for random access on slow media. You might consider setting your XZ_DEFAULTS environment variable to use these higher performance options.

Let’s look at compressing this Postscript file:

$ ls -lh 
-rw-r--r-- 1 user user 30M Nov 20  2012

$ time gzip -9 -k

real    0m1.828s
user    0m1.697s
sys     0m0.029s

$ ls -lh
-rw-r--r-- 1 user user 28M Nov 20  2012

$ time xz -0 --threads=0

real    0m1.250s
user    0m8.781s
sys     0m0.077s

$ ls -lh
-rw-r--r-- 1 user user 27M Nov 20  2012

$ time gzip -1 -k

real    0m1.365s
user    0m1.330s
sys     0m0.031s

$ ls -lh
-rw-r--r-- 1 user user 28M Nov 20  2012

$ time xz

real    0m1.974s
user    0m9.843s
sys     0m0.122s

$ ls -lh
-rw-r--r-- 1 user user 26M Nov 20  2012

$ time xz -9 --threads=0

real    0m13.182s
user    0m12.980s
sys     0m0.190s

$ ls -lh
-rw-r--r-- 1 user user 23M Nov 20  2012

$ time xz -6 --threads=1

real    0m13.615s
user    0m13.494s
sys     0m0.081s

$ ls -lh
-rw-r--r-- 1 user user 24M Nov 20  2012

As you can see, the fastest xz exceeds the compression ratio of the slowest gzip and the speed of the fastest gzip, by using multiple cores (test machine is an 8-thread/4-core portable i7). Allowing slightly more time than gzip gives slighly better compression, and allowing significantly more time than gzip yields marginally better compression still. But note the last test – with the xz defaults, the time is the worst and the compression isn’t the best. In that case, compression is more than 10% better but the time is 10 times worse. These defaults will only hinder the adoption of xz, which is a shame because faster defaults would get more users and already provide better results than gzip on typical workloads on modern CPU’s.

UPDATE 20170825: Antonio Diaz Diaz asked me to look at lzip as well. Here are the results I got:

$ time lzip -1 --keep 
real    0m10.923s
user    0m9.993s
sys     0m0.037s

$ ls -lh 
-rw-rw-r-- 1 user user 26M Apr  6  2013

$ time lzip -9 --keep 

real    0m16.348s
user    0m16.126s
sys     0m0.167s

$ ls -lh 
-rw-rw-r-- 1 user user 23M Apr  6  2013

$ time xz -9 --threads=1 --keep

real    0m14.867s
user    0m14.559s
sys     0m0.220s

$ ls -lh
-rw-rw-r-- 1 user user 23M Apr  6  2013

As you can see, compression is better than gzip and about the same as xz. xz beats lzip’s performance, both in single and (more obviously) multithreadded, while gzip is an order of magnitude faster than lzip, but offers less compression. Antonio’s lzip page points out that defects in busybox’s unxz prevent the catching of CRC errors, so you might want to keep that in mind if you’re running embedded. I would encourage the busybox developers to care about data integrity over performance..