find the biggest files recursively, no matter how many

find . -type f -printf '%20s %p\n' | sort -n | cut -b22- | tr '\n' '\000' | xargs -0 ls -laSr
This command will find the biggest files recursively under a certain directory, no matter if they are too many. If you try the regular commands ("find -type f -exec ls -laSr {} +" or "find -type f -print0 | xargs -0 ls -laSr") the sorting won't be correct because of command line arguments limit. This command won't use command line arguments to sort the files and will display the sorted list correctly.
Sample Output
-rw-r--r-- 1 fsilveira fsilveira 542537728 2001-10-03 19:29 ./rh72/enigma-SRPMS-disc2.iso
-rw-r--r-- 1 fsilveira fsilveira 624476160 2001-10-03 19:35 ./rh72/enigma-docs.iso
-rw-r--r-- 1 fsilveira fsilveira 669429760 2001-10-03 19:24 ./rh72/enigma-i386-disc2.iso
-rw-r--r-- 1 fsilveira fsilveira 677961728 2001-10-03 19:22 ./rh72/enigma-i386-disc1.iso
-rw-r--r-- 1 fsilveira fsilveira 680282112 2001-10-03 19:27 ./rh72/enigma-SRPMS-disc1.iso

10
By: fsilveira
2009-08-13 13:13:33

1 Alternatives + Submit Alt

  • A different approach to the problem - maintain a small sorted list, print the largest as we go, then the top 10 at the end. I often find that the find and sort take a long time, and the large file might appear near the start of the find. By printing as we go, I get better feedback. The sort used in this will be much slower on perls older than 5.8. Show Sample Output


    -2
    find . -type f|perl -lne '@x=sort {$b->[0]<=>$a->[0]}[(stat($_))[7],$_],@x;splice(@x,11);print "@{$x[0]}";END{for(@x){print "@$_"}'
    bazzargh · 2012-01-08 14:43:43 0

What Others Think

the sort manpage is a little cryptic, but you can sort on fields other than the beginning of the line (similar to cut): find . -type f -ls | sort -n --key=7 Pipe that to "cut -b68-" to get only the filenames.
bwoodacre · 483 weeks and 4 days ago
I thought about "find -ls" before but it is bad because the file name isn't always at the same position, depending on the owner/group/size/time/date strings length. The time/date length changes for some specific locales.
fsilveira · 483 weeks and 4 days ago
agreed! Another shoddy way is to pipe into "awk '{print $11}' " or some such to get the filenames, but they still have to be quoted or null-separated. However, I think you can get rid of cut and tr if you modify the -printf format string: find . -type f -printf '%s\t"%p"\n' | sort -n | cut -f2 | xargs ls -laS the tab is for cut and the quotes for xargs. Another option: on a single fs, use inode numbers to avoid messy filenames.
bwoodacre · 483 weeks and 4 days ago
The 'cut' could not be avoided in that command and the 'tr' is to handle filenames with spaces correctly, although filenames with '\n' (which are *much* harder to be found) will be missed.
fsilveira · 483 weeks and 3 days ago
I see, here's maybe even a shorter version. Have you seen the -print0 option to find? find -type f -print0 | xargs -0 ls -la | sort -nr --key=5 this handles ANY type of filename even those with newlines and spaces.
bwoodacre · 482 weeks and 6 days ago

What do you think?

Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?

You must be signed in to comment.

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands



Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: