Commands matching find duplicates sorted by votes

Commands matching find duplicates (10)

sorted by

Remove duplicate entries in a file without sorting.

Using awk, find duplicates in a file without sorting, which reorders the contents. awk will not reorder them, and still find and remove duplicates which you can then redirect into another file.
This is sample output - yours may be different.
86

awk '!x[$0]++' <file>

din7 · 2009-12-20 02:33:21 24
DELETE all those duplicate files but one based on md5 hash comparision in the current directory tree

This one-liner will the *delete* without any further confirmation all 100% duplicates but one based on their md5 hash in the current directory tree (i.e including files in its subdirectories). Good for cleaning up collections of mp3 files or pictures of your dog|cat|kids|wife being present in gazillion incarnations on hd. md5sum can be substituted with sha1sum without problems. The actual filename is not taken into account-just the hash is used. Whatever sort thinks is the first filename is kept. It is assumed that the filename does not contain 0x00. As per the good suggestion in the first comment, this one does a hard link instead: find . -xdev -type f -print0 | xargs -0 md5sum | sort | perl -ne 'chomp; $ph=$h; ($h,$f)=split(/\s+/,$_,2); if ($h ne $ph) { $k = $f; } else { unlink($f); link($k, $f); }' Show Sample Output
This is sample output - yours may be different.
```
removed `./duplicate0.mp3'
removed `./1/duplicate1.mp3'
removed `./2/duplicate2.mp3'
```
19

find . -type f -print0|xargs -0 md5sum|sort|perl -ne 'chomp;$ph=$h;($h,$f)=split(/\s+/,$_,2);print "$f"."\x00" if ($h eq $ph)'|xargs -0 rm -v --

masterofdisaster · 2009-06-07 03:14:06 15
Find Duplicate Files (based on MD5 hash) -- For Mac OS X

This works on Mac OS X using the `md5` command instead of `md5sum`, which works similarly, but has a different output format. Note that this only prints the name of the duplicates, not the original file. This is handy because you can add `| xargs rm` to the end of the command to delete all the duplicates while leaving the original.
This is sample output - yours may be different.
2

find . -type f -exec md5 '{}' ';' | sort | uniq -f 3 -d | sed -e "s/.*($.*$).*/\1/"

noahspurrier · 2012-01-14 08:54:12 10
Find Duplicate Files (based on size first, then MD5 hash)

Finds duplicates based on MD5 sum. Compares only files with the same size. Performance improvements on: find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate The new version takes around 3 seconds where the old version took around 17 minutes. The bottle neck in the old command was the second find. It searches for the files with the specified file size. The new version keeps the file path and size from the beginning.
This is sample output - yours may be different.
2

find -not -empty -type f -printf "%-30s'\t\"%h/%f\"\n" | sort -rn -t$'\t' | uniq -w30 -D | cut -f 2 -d $'\t' | xargs md5sum | sort | uniq -w32 --all-repeated=separate

fobos3 · 2014-10-19 02:00:55 10
List all duplicate directories

Very quick! Based only on the content sizes and the character counts of filenames. If both numbers are equal then two (or more) directories seem to be most likely identical. if in doubt apply: diff -rq path_to_dir1 path_to_dir2 AWK function taken from here: http://stackoverflow.com/questions/2912224/find-duplicates-lines-based-on-some-delimited-fileds-on-line Show Sample Output
This is sample output - yours may be different.
```
215 1320 ./hwebcam048.kopija
215 1320 ./hwebcam048
24 16 ./ac3dlx/lib/tk8.5/ttk/CVS
24 16 ./ac3dlx/lib/tk8.4/CVS
24 16 ./ac3dlx/tcl/CVS
```
1

find . -type d| while read i; do echo $(ls -1 "$i"|wc -m) $(du -s "$i"); done|sort -s -n -k1,1 -k2,2 |awk -F'[ \t]+' '{ idx=$1$2; if (array[idx] == 1) {print} else if (array[idx]) {print array[idx]; print; array[idx]=1} else {array[idx]=$0}}'

knoppix5 · 2014-02-25 22:50:09 27
Delete duplicated dictionaries in spell check list

When you right click a text box in Firefox and you have installed a few dictionaries you'll see a loooong list of spellcheckers. Most of them are duplicated (symlinks). This command deletes de duplicates and reduces the list.
This is sample output - yours may be different.
0

sudo find /usr/share/hunspell/ -lname '*' -delete

atti · 2010-11-07 15:11:02 6
Find Duplicate Files (based on size first, then MD5 hash)

This is a modified version of the OP, wrapped into a bash function. This version handles newlines and other whitespace correctly, the original has problems with the thankfully rare case of newlines in the file names. It also allows checking an arbitrary number of directories against each other, which is nice when the directories that you think might have duplicates don't have a convenient common ancestor directory.
This is sample output - yours may be different.
0

find-duplicates () { find "$@" -not -empty -type f -printf "%s\0" | sort -rnz | uniq -dz | xargs -0 -I{} -n1 find "$@" -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate; }

mpeschke · 2013-01-23 23:20:26 5

Find Duplicate Files (based on size, name, and md5sum)

It works extremely fast, because it calculates md5sum only on the files that have the same size and name. But there is nothing for free - it won't find duplicates with the different names. Show Sample Output

find -type f -printf '%20s\t%100f\t%p\n' | sort -n | uniq -Dw121 | awk -F'\t' '{print $3}' | xargs -d '\n' md5sum | uniq -Dw32 | cut -b 35- | xargs -d '\n' ls -lU

ant7 · 2017-05-21 02:26:16 16

Find & remove files that are duplicates but with different extensions recursively

Very useful to get rid of backup files or wrong extension files that lure in your folders In this example first I do two searches for all filenames of the two extensions .jpg and .png, then delete the extension and only output the now duplicate files. I loop with these results and print a log and delete the file with the extension I dislike. Show Sample Output
This is sample output - yours may be different.
```
deleted: contrib/image/4legs/cat.png
deleted: contrib/image/4legs/dog.png
deleted: contrib/image/withwings/fly.png
```
0

for f in `comm -1 -2 <(sort <(find contrib/image/ -name *.png | sed 's/\.[^.]*$//')) <(sort <(find contrib/image/ -name *.jpg | sed 's/\.[^.]*$//'))`;do rm "$f.png" && echo "deleted: $f.png";done

TheoNaciri · 2017-11-28 15:48:15 24
find duplicate files in a directory and choose which one to delete

allow you to find duplicates files in "DIRECTORY" and choose wich one to delete fudpes must be installed: sudo apt-get install fdupes
This is sample output - yours may be different.
-1

fdupes DIRECTORY/ -r -d

desmatron · 2011-03-05 09:48:35 5

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands

Check These Out

list block devices

Shows all block devices in a tree with descruptions of what they are.

View a colorful logfile using less

View Processeses like a fu, fu

I don't truly enjoy many commands more than this one, which I alias to be ps1.. Cool to be able to see the heirarchy and makes it clearer what need to be killed, and whats really going on.

Selecting a random file/folder of a folder

Also looks in subfolders

Expand shortened URLs

This relies on a public API from http://longurl.org. So, this has the weakness that if the service disappears, the function will break. However, it has the advantage that the shortened URL service will not be tracking your IP address and other metrics, but instead will track longurl.org. Thus, you can remain anonymous from the shortened URL services (although not anonymous from longurl.org). It does no sanity checking that you have provided an argument. If you do not provide one, "message" is displayed to STDOUT.

Show linux kernel modules dependencies

Use modprobe to list all the dependencies of a certain kernel module. Handy when debugging system issues.

Shutdown a Windows machine from Linux

This will issue a shutdown command to the Windows machine. username must be an administrator on the Windows machine. Requires samba-common package installed. Other relevant commands are: net rpc shutdown -r : reboot the Windows machine net rpc abortshutdown : abort shutdown of the Windows machine Type: net rpc to show all relevant commands

git remove files which have been deleted

I've used technicalpickles command a lot, but this one handles whitespaces in filenames. I'm sure you want to create an alias for it :)

List your installed Chromium extensions (with url to each page)

Gives you a list for all installed chrome (chromium) extensions with URL to the page of the extension. With this you can easy add a new Bookmark folder called "extensions" add every URL to that folder, so it will be synced and you can access the names from every computer you are logged in. ------------------------------------------------------------------------------------------------------------------ Only tested with chromium, for chrome you maybe have to change the find $PATH.

Press Any Key to Continue

Halt script progress until a key has been pressed. Source: http://bash-hackers.org/wiki/doku.php/mirroring/bashfaq/065

Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for:

» all commands
» commands with 3 up-votes (commandlinefu3)
» commands with 10 up-votes (commandlinefu10)
» commands matching find duplicates

Commands matching find duplicates (10) the last day the last week the last month all time sorted by date votes

What's this?

Check These Out

Stay in the loop…

Commands matching find duplicates (10)

sorted by