This dup finder saves time by comparing size first, then md5sum, it doesn't delete anything, just lists them.
This uses Bash's "process substitution" feature to compare (using diff) the output of two different process pipelines.
Useful mainly for debugging or troubleshooting an application or system, such as X11, Apache, Bind, DHCP and others. Another useful switch that can be combined with -mmin, -mtime and so forth is -daystart. For example, to find files that were modified in the /etc directory only yesterday:
sudo find /etc -daystart -mtime 1 -type f
I have a bash alias for this command line and find it useful for searching C code for error messages. The -H tells grep to print the filename. you can omit the -i to match the case exactly or keep the -i for case-insensitive matching. This find command find all .c and .h files Show Sample Output
Find files in a specific date range - in this case, the first half of last year.
-newermt = modification time of the file is more recent than this date
GNU find allows any date specfication that GNU date would accept, e.g.
find . -type f -newermt "3 years ago" ! -newermt "2 years ago"
or
find . -type f -newermt "last monday"
Compress information DBs of firefox to speed up the launch of browser.
This makes an alias for a command named 'busy'. The 'busy' command opens a random file in /usr/include to a random line with vim. Drop this in your .bash_aliases and make sure that file is initialized in your .bashrc.
This has helped me numerous times trying to find either log files or tmp files that get created after execution of a command. And really eye opening as to how active a given process really is. Play around with -anewer, -cnewer & -newerXY Show Sample Output
If you don't want to delete them, but just want to list them, do
find -L /path -type l
If you want to delete them with confirmation first, do
find -L /path -type l -exec rm -i {} +
Using the -L flag follows symlinks, so the -type l test only returns true if the link can't be followed, or is a symlink to another broken symlink.
This one-liner will the *delete* without any further confirmation all 100% duplicates but one based on their md5 hash in the current directory tree (i.e including files in its subdirectories).
Good for cleaning up collections of mp3 files or pictures of your dog|cat|kids|wife being present in gazillion incarnations on hd.
md5sum can be substituted with sha1sum without problems.
The actual filename is not taken into account-just the hash is used.
Whatever sort thinks is the first filename is kept.
It is assumed that the filename does not contain 0x00.
As per the good suggestion in the first comment, this one does a hard link instead:
find . -xdev -type f -print0 | xargs -0 md5sum | sort | perl -ne 'chomp; $ph=$h; ($h,$f)=split(/\s+/,$_,2); if ($h ne $ph) { $k = $f; } else { unlink($f); link($k, $f); }'
Show Sample Output
Taken from: http://www.webmasterworld.com/forum40/1310.htm
Calculates md5 sum of files. sort (required for uniq to work). uniq based on only the hash. use cut ro remove the hash from the result.
This command finds and prints all the symbolic and hard links to a file. Note that the file argument itself be a link and it will find the original file as well.
You can also do this with the inode number for a file or directory by first using stat or ls or some other tool to get the number like so:
stat -Lc %i file
or
ls -Hid file
And then using:
find -L / -inum INODE_NUMBER -exec ls -ld {} +
This command find all files in the current dir and subdirs, and replace all occurances of "oldstring" in every file with "newstring".
A quick way to find and delete empty dirs, it starts in the current working directory. If you do find . -empty -type d you will see what could be removed, or to a test run.
This can be much faster than downloading one or both trees to a common servers and comparing the files there. After, only those files could be copied down for deeper comparison if needed. Show Sample Output
commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.
Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.
» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10
Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):
Subscribe to the feed for: