cat WAR_AND_PEACE_By_LeoTolstoi.txt | tr -cs "[:alnum:]" "\n"| tr "[:lower:]" "[:upper:]" | sort -S16M | uniq -c |sort -nr | cat -n | head -n 30
("sort -S1G" - Linux/GNU sort only) will also do the job but as some drawbacks (caused by space/time complexity of sorting) for bigger files...
# get some input http://www.gutenberg.org
$ cat WAR_AND_PEACE_By_LeoTolstoi.txt | tr -cs "[:alnum:]" "\n"| tr "[:lower:]" "[:upper:]" | awk '{h[$1]++}END{for (i in h){print h[i]" "i}}'|sort -nr | cat -n | head -n 30
1 34720 THE
2 22300 AND
3 16753 TO
4 15007 OF
5 10608 A
6 10004 HE
7 9036 IN
8 8204 THAT
9 7984 HIS
10 7359 WAS
11 5710 WITH
12 5617 IT
13 5365 HAD
14 4725 HER
15 4697 NOT
16 4637 HIM
17 4547 AT
18 4524 I
19 4414 S
20 4054 BUT
21 4035 AS
22 4014 ON
23 3871 YOU
24 3555 FOR
25 3488 SHE
26 3347 IS
27 2842 SAID
28 2813 ALL
29 2709 FROM
30 2458 BY
Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?
You must be signed in to comment.
commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.
Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.
» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10
Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):
Subscribe to the feed for: