Commands tagged duplicate files (3)

  • Finds duplicates based on MD5 sum. Compares only files with the same size. Performance improvements on: find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate The new version takes around 3 seconds where the old version took around 17 minutes. The bottle neck in the old command was the second find. It searches for the files with the specified file size. The new version keeps the file path and size from the beginning.


    2
    find -not -empty -type f -printf "%-30s'\t\"%h/%f\"\n" | sort -rn -t$'\t' | uniq -w30 -D | cut -f 2 -d $'\t' | xargs md5sum | sort | uniq -w32 --all-repeated=separate
    fobos3 · 2014-10-19 02:00:55 10
  • * Find all file sizes and file names from the current directory down (replace "." with a target directory as needed). * sort the file sizes in numeric order * List only the duplicated file sizes * drop the file sizes so there are simply a list of files (retain order) * calculate md5sums on all of the files * replace the first instance of two spaces (md5sum output) with a \0 * drop the unique md5sums so only duplicate files remain listed * Use AWK to aggregate identical files on one line. * Remove the blank line from the beginning (This was done more efficiently by putting another "IF" into the AWK command, but then the whole line exceeded the 255 char limit). >>>> Each output line contains the md5sum and then all of the files that have that identical md5sum. All fields are \0 delimited. All records are \n delimited.


    0
    find . -type f -not -empty -printf "%-25s%p\n"|sort -n|uniq -D -w25|cut -b26-|xargs -d"\n" -n1 md5sum|sed "s/ /\x0/"|uniq -D -w32|awk -F"\0" 'BEGIN{l="";}{if(l!=$1||l==""){printf "\n%s\0",$1}printf "\0%s",$2;l=$1}END{printf "\n"}'|sed "/^$/d"
    alafrosty · 2013-10-22 13:34:19 8
  • To allow recursivity : find -type f -exec md5sum '{}' ';' | sort | uniq -c -w 33 | sort -gr | head -n 5 | cut -c1-7,41- Display only filenames : find -maxdepth 1 -type f -exec md5sum '{}' ';' | sort | uniq -c -w 33 | sort -gr | head -n 5 | cut -c43- Show Sample Output


    0
    find -maxdepth 1 -type f -exec md5sum '{}' ';' | sort | uniq -c -w 33 | sort -gr | head -n 5 | cut -c1-7,41-
    MaDCOw · 2017-02-09 11:36:31 18

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands


Check These Out

check open ports (both ipv4 and ipv6)
Check open TCP and UDP ports

Transfer large files/directories with no overhead over the network
This invokes tar on the remote machine and pipes the resulting tarfile over the network using ssh and is saved on the local machine. This is useful for making a one-off backup of a directory tree with zero storage overhead on the source. Variations on this include using compression on the source by using 'tar cfvp' or compression at the destination via $ ssh user@host "cd dir; tar cfp - *" | gzip - > file.tar.gz

bash alias for sdiff: differ
this is just okey

list any Linux files without users or groups
suspicious/anomalous ownership may indicate system breach; should return no results

Limit the transfer rate and size of data over a pipe
This example will close the pipe after transferring 100MB at a speed of 3MB per second.

Have a random "cow" say a random thing
You need to have fortune and cowsay installed. It uses a subshell to list cow files in you cow directory (this folder is default for debian based systems, others might use another folder). you can add it to your .bashrc file to have it great you with something interesting every time you start a new session.

Find usb device in realtime
Using this command you can track a moment when usb device was attached.

Watch active calls on an Asterisk PBX
Show active calls as the happen on an Asterisk server. Note that the Asterisk command (in single quotes) is formatted for Asterisk 1.6. Use the -n flag on the watch command to modify the refresh period (in seconds - default is 2 seconds).

Generate soothing noise
Substitute 'brown' with 'pink' or 'white' according to your taste. I put this on my headphones when I'm working in an "open concept" office, where there are always three to five conversations going in earshot, or if I'm working somewhere it is "rude" of me to tell a person to turn off their cubicle radio.

Process command output line by line in a while loop
This snippet allows to process the output of any bash command line by line.


Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for: