cat WAR_AND_PEACE_By_LeoTolstoi.txt | tr -cs "[:alnum:]" "\n"| tr "[:lower:]" "[:upper:]" | sort -S16M | uniq -c |sort -nr | cat -n | head -n 30
("sort -S1G" - Linux/GNU sort only) will also do the job but as some drawbacks (caused by space/time complexity of sorting) for bigger files...
# get some input http://www.gutenberg.org $ cat WAR_AND_PEACE_By_LeoTolstoi.txt | tr -cs "[:alnum:]" "\n"| tr "[:lower:]" "[:upper:]" | awk '{h[$1]++}END{for (i in h){print h[i]" "i}}'|sort -nr | cat -n | head -n 30 1 34720 THE 2 22300 AND 3 16753 TO 4 15007 OF 5 10608 A 6 10004 HE 7 9036 IN 8 8204 THAT 9 7984 HIS 10 7359 WAS 11 5710 WITH 12 5617 IT 13 5365 HAD 14 4725 HER 15 4697 NOT 16 4637 HIM 17 4547 AT 18 4524 I 19 4414 S 20 4054 BUT 21 4035 AS 22 4014 ON 23 3871 YOU 24 3555 FOR 25 3488 SHE 26 3347 IS 27 2842 SAID 28 2813 ALL 29 2709 FROM 30 2458 BY
Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?
You must be signed in to comment.
commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.
Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.
» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10
Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):
Subscribe to the feed for: