Hide

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again.

Delete that bloated snippets file you've been using and share your personal repository with the world. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.


If you have a new feature suggestion or find a bug, please get in touch via http://commandlinefu.uservoice.com/

Get involved!

You can sign-in using OpenID credentials, or register a traditional username and password.

First-time OpenID users will be automatically assigned a username which can be changed after signing in.

Hide

Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for:

Hide

News

2011-03-12 - Confoo 2011 presentation
Slides are available from the commandlinefu presentation at Confoo 2011: http://presentations.codeinthehole.com/confoo2011/
2011-01-04 - Moderation now required for new commands
To try and put and end to the spamming, new commands require moderation before they will appear on the site.
2010-12-27 - Apologies for not banning the trolls sooner
Have been away from the interwebs over Christmas. Will be more vigilant henceforth.
2010-09-24 - OAuth and pagination problems fixed
Apologies for the delay in getting Twitter's OAuth supported. Annoying pagination gremlin also fixed.
Hide

Tags

Hide

Functions

Commands matching find duplicates from sorted by
Terminal - Commands matching find duplicates - 82 results
awk '!x[$0]++' <file>
2009-12-20 02:33:21
User: din7
Functions: awk
64

Using awk, find duplicates in a file without sorting, which reorders the contents. awk will not reorder them, and still find and remove duplicates which you can then redirect into another file.

find . -type f -print0|xargs -0 md5sum|sort|perl -ne 'chomp;$ph=$h;($h,$f)=split(/\s+/,$_,2);print "$f"."\x00" if ($h eq $ph)'|xargs -0 rm -v --
2009-06-07 03:14:06
Functions: find perl rm xargs
19

This one-liner will the *delete* without any further confirmation all 100% duplicates but one based on their md5 hash in the current directory tree (i.e including files in its subdirectories).

Good for cleaning up collections of mp3 files or pictures of your dog|cat|kids|wife being present in gazillion incarnations on hd.

md5sum can be substituted with sha1sum without problems.

The actual filename is not taken into account-just the hash is used.

Whatever sort thinks is the first filename is kept.

It is assumed that the filename does not contain 0x00.

As per the good suggestion in the first comment, this one does a hard link instead:

find . -xdev -type f -print0 | xargs -0 md5sum | sort | perl -ne 'chomp; $ph=$h; ($h,$f)=split(/\s+/,$_,2); if ($h ne $ph) { $k = $f; } else { unlink($f); link($k, $f); }'
find . -type d| while read i; do echo $(ls -1 "$i"|wc -m) $(du -s "$i"); done|sort -s -n -k1,1 -k2,2 |awk -F'[ \t]+' '{ idx=$1$2; if (array[idx] == 1) {print} else if (array[idx]) {print array[idx]; print; array[idx]=1} else {array[idx]=$0}}'
2014-02-25 22:50:09
User: knoppix5
Functions: awk du echo find ls read sort wc
1

Very quick! Based only on the content sizes and the character counts of filenames. If both numbers are equal then two (or more) directories seem to be most likely identical.

if in doubt apply:

diff -rq path_to_dir1 path_to_dir2

AWK function taken from here:

http://stackoverflow.com/questions/2912224/find-duplicates-lines-based-on-some-delimited-fileds-on-line

sudo find /usr/share/hunspell/ -lname '*' -delete
2010-11-07 15:11:02
User: atti
Functions: find sudo
0

When you right click a text box in Firefox and you have installed a few dictionaries you'll see a loooong list of spellcheckers. Most of them are duplicated (symlinks). This command deletes de duplicates and reduces the list.

find-duplicates () { find "$@" -not -empty -type f -printf "%s\0" | sort -rnz | uniq -dz | xargs -0 -I{} -n1 find "$@" -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate; }
2013-01-23 23:20:26
User: mpeschke
Functions: find md5sum sort uniq xargs
0

This is a modified version of the OP, wrapped into a bash function.

This version handles newlines and other whitespace correctly, the original has problems with the thankfully rare case of newlines in the file names.

It also allows checking an arbitrary number of directories against each other, which is nice when the directories that you think might have duplicates don't have a convenient common ancestor directory.

find -not -empty -type f -printf "%-30s'\t\"%h/%f\"\n" | sort -rn -t$'\t' | uniq -w30 -D | cut -f 2 -d $'\t' | xargs md5sum | sort | uniq -w32 --all-repeated=separate
2014-10-19 02:00:55
User: fobos3
Functions: cut find md5sum sort uniq xargs
0

Finds duplicates based on MD5 sum. Compares only files with the same size. Performance improvements on:

find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

The new version takes around 3 seconds where the old version took around 17 minutes. The bottle neck in the old command was the second find. It searches for the files with the specified file size. The new version keeps the file path and size from the beginning.

fdupes DIRECTORY/ -r -d
2011-03-05 09:48:35
User: desmatron
-1

allow you to find duplicates files in "DIRECTORY" and choose wich one to delete

fudpes must be installed: sudo apt-get install fdupes

find . -type f -exec md5 '{}' ';' | sort | uniq -f 3 -d | sed -e "s/.*(\(.*\)).*/\1/"
2012-01-14 08:54:12
User: noahspurrier
Functions: find sed sort uniq
-1

This works on Mac OS X using the `md5` command instead of `md5sum`, which works similarly, but has a different output format. Note that this only prints the name of the duplicates, not the original file. This is handy because you can add `| xargs rm` to the end of the command to delete all the duplicates while leaving the original.