Hide

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again.

Delete that bloated snippets file you've been using and share your personal repository with the world. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.


If you have a new feature suggestion or find a bug, please get in touch via http://commandlinefu.uservoice.com/

Get involved!

You can sign-in using OpenID credentials, or register a traditional username and password.

First-time OpenID users will be automatically assigned a username which can be changed after signing in.

Hide

Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for:

Hide

News

2011-03-12 - Confoo 2011 presentation
Slides are available from the commandlinefu presentation at Confoo 2011: http://presentations.codeinthehole.com/confoo2011/
2011-01-04 - Moderation now required for new commands
To try and put and end to the spamming, new commands require moderation before they will appear on the site.
2010-12-27 - Apologies for not banning the trolls sooner
Have been away from the interwebs over Christmas. Will be more vigilant henceforth.
2010-09-24 - OAuth and pagination problems fixed
Apologies for the delay in getting Twitter's OAuth supported. Annoying pagination gremlin also fixed.
Hide

Tags

Hide

Functions

gzip vs bzip2 at compressing random strings?

Terminal - gzip vs bzip2 at compressing random strings?
< /dev/urandom tr -dc A-Za-z0-9_ | head -c $((1024 * 1024)) | tee >(gzip -c > out.gz) >(bzip2 -c > out.bz) > /dev/null
2009-04-04 13:23:01
User: jnash
Functions: bzip2 gzip head tee tr
-2
gzip vs bzip2 at compressing random strings?

Does that count as a win for bzip2?

Alternatives

There are 8 alternatives - vote for the best!

Terminal - Alternatives

Know a better way?

If you can do better, submit your command here.

What others think

I use bzip2 too all the time, even thought it's much slower, thinking it compresses way better. But what's that, like .01 % smaller? I think I'm reverting to gzip.

Comment by cbilson 285 weeks and 2 days ago

I use 'lzma --fast'. The speed is comparable to gzip and the compression is better than bzip2.

Comment by atoponce 285 weeks and 2 days ago

Love the one liner on its own merits, but disagree with the method. The data produced here isn't a fair representation. Such random data (though within a smaller gamut of 63 characters than the full 256) gives a result more akin to compressing an already compressed file.

Comment by peterc 285 weeks and 1 day ago

I don't know the case for binary files but for text files bzip2 is significantly better. When I ran the following to compress human genome DNA sequence (3,010 MB total size) by gzip and bzip2:

cat human_genome | tee >(gzip -c > out.gz) >(bzip2 -c > out.bz) > /dev/null

The sizes of output files are;

out.gz 892MB

out.bz 782MB

Comment by alperyilmaz 285 weeks and 1 day ago

@peterc: Of course. Thats why I titled it gzip vs bzip2 at compressing _random_ strings. This wasn't any technical comparison. I was just curious on what'd win on random strings :)

@atoponce: Ah. Then you might want to add it to the races like so.. >(lzma --fast ...)

Comment by jnash 285 weeks and 1 day ago

There are many types of random though. Random strings of normal distribution and format and really, really random strings. However, @alperyilmaz's experiment with DNA sequences is interesting. I guess DNA sequences aren't as random as I thought!

Just for fun, I wondered what the performance would be like on the first sort of "random", say Markov chain generated text:

./markov | head -c 10000 | tee >(gzip -c > out.gz) >(bzip2 -c > out.bz) > /dev/null

(markov generates several hundred Markov generated sentences using Macbeth as the source text)

I tried it three times. In each case, the bzip2 file was 90%, 91%, and 90% of the gz file. So not as amazing as I'd have expected (and worse than the DNA sequence!!)

Point of all of this? God knows but it was fun for a few minutes ;-)

Comment by peterc 285 weeks ago

And for those people voting this down, I think it's worth voting up not for what it specifically does, but because the latter part of the line provides a nice technique for comparing two compression techniques nonetheless :)

Comment by peterc 285 weeks ago

Your point of view

You must be signed in to comment.

Related sites and podcasts