count occurences of each word in novel David Copperfield

wget -q -O- http://www.gutenberg.org/dirs/etext96/cprfd10.txt | sed '1,419d' | tr "\n" " " | tr " " "\n" | perl -lpe 's/\W//g;$_=lc($_)' | grep "^[a-z]" | awk 'length > 1' | sort | uniq -c | awk '{print $2"\t"$1}'
This command might not be useful for most of us, I just wanted to share it to show power of command line. Download simple text version of novel David Copperfield from Poject Gutenberg and then generate a single column of words after which occurences of each word is counted by sort | uniq -c combination. This command removes numbers and single characters from count. I'm sure you can write a shorter version.
Sample Output
aback	1
abandon	6
abandoned	13
abase	1
abased	1
abashed	7
abated	1
abatement	1
...

-4
2009-05-04 16:00:39

What Others Think

?Is a joke? curl http://www.gutenberg.org/dirs/etext96/cprfd10.txt|awk -v RS='[^a-zA-Z0-9]' /./'{a[$1]++}END{for (i in a) print a[i], i|"sort -n"}'
point_to_null · 552 weeks and 5 days ago
I like the posted command better than the one by point_to_null. While point_to_null's is simpler and shorter, it does not strip out numbers and single characters. The download stats are nice, but not really an improvement.
jestin · 552 weeks and 5 days ago
@point_to_null: wow! i wouldn't imagine this can be done with as short command as yours. you must be a commandline guru..
alperyilmaz · 552 weeks and 5 days ago
Nice one! point_to_null gets points for doing it with fewer pipes. alperyilmaz gets points for using more tools though (in order: wget, sed, tr, tr, perl, grep, awk, sort, uniq, awk) showing people what piping really means.
bwoodacre · 552 weeks and 5 days ago
what the...
linuxrawkstar · 552 weeks and 5 days ago
...exactly. This doesn't make much sense. Any explanation why someone would want to count the words in this novel ?
Alanceil · 552 weeks and 4 days ago
@Alanceil something like this could be modified to find occurrences of words or letters following each other to create a Markov chain modeled after a given text. That would be useful for a text generator if you needed one...
leon · 552 weeks and 3 days ago
tag clouds, semantic analysis, you know, stuff like that.
mondotofu · 515 weeks ago

What do you think?

Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?

You must be signed in to comment.

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands



Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: