gzip vs bzip2 at compressing random strings?

< /dev/urandom tr -dc A-Za-z0-9_ | head -c $((1024 * 1024)) | tee >(gzip -c > out.gz) >(bzip2 -c > out.bz) > /dev/null
Does that count as a win for bzip2?
Sample Output
$ \ls -l out*
-rw-r--r-- 1 chillu users 789711 2009-04-04 19:01 out.bz
-rw-r--r-- 1 chillu users 789947 2009-04-04 19:01 out.gz

By: jnash
2009-04-04 13:23:01

What Others Think

I use bzip2 too all the time, even thought it's much slower, thinking it compresses way better. But what's that, like .01 % smaller? I think I'm reverting to gzip.
cbilson · 587 weeks and 6 days ago
I use 'lzma --fast'. The speed is comparable to gzip and the compression is better than bzip2.
atoponce · 587 weeks and 6 days ago
Love the one liner on its own merits, but disagree with the method. The data produced here isn't a fair representation. Such random data (though within a smaller gamut of 63 characters than the full 256) gives a result more akin to compressing an already compressed file.
peterc · 587 weeks and 6 days ago
I don't know the case for binary files but for text files bzip2 is significantly better. When I ran the following to compress human genome DNA sequence (3,010 MB total size) by gzip and bzip2: cat human_genome | tee >(gzip -c > out.gz) >(bzip2 -c > out.bz) > /dev/null The sizes of output files are; out.gz 892MB out.bz 782MB
alperyilmaz · 587 weeks and 6 days ago
@peterc: Of course. Thats why I titled it gzip vs bzip2 at compressing _random_ strings. This wasn't any technical comparison. I was just curious on what'd win on random strings :) @atoponce: Ah. Then you might want to add it to the races like so.. >(lzma --fast ...)
jnash · 587 weeks and 6 days ago
There are many types of random though. Random strings of normal distribution and format and really, really random strings. However, @alperyilmaz's experiment with DNA sequences is interesting. I guess DNA sequences aren't as random as I thought! Just for fun, I wondered what the performance would be like on the first sort of "random", say Markov chain generated text: ./markov | head -c 10000 | tee >(gzip -c > out.gz) >(bzip2 -c > out.bz) > /dev/null (markov generates several hundred Markov generated sentences using Macbeth as the source text) I tried it three times. In each case, the bzip2 file was 90%, 91%, and 90% of the gz file. So not as amazing as I'd have expected (and worse than the DNA sequence!!) Point of all of this? God knows but it was fun for a few minutes ;-)
peterc · 587 weeks and 5 days ago
And for those people voting this down, I think it's worth voting up not for what it specifically does, but because the latter part of the line provides a nice technique for comparing two compression techniques nonetheless :)
peterc · 587 weeks and 5 days ago

What do you think?

Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?

You must be signed in to comment.

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands

Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: