This dup finder saves time by comparing size first, then md5sum, it doesn't delete anything, just lists them.
This one-liner will the *delete* without any further confirmation all 100% duplicates but one based on their md5 hash in the current directory tree (i.e including files in its subdirectories).
Good for cleaning up collections of mp3 files or pictures of your dog|cat|kids|wife being present in gazillion incarnations on hd.
md5sum can be substituted with sha1sum without problems.
The actual filename is not taken into account-just the hash is used.
Whatever sort thinks is the first filename is kept.
It is assumed that the filename does not contain 0x00.
As per the good suggestion in the first comment, this one does a hard link instead:
find . -xdev -type f -print0 | xargs -0 md5sum | sort | perl -ne 'chomp; $ph=$h; ($h,$f)=split(/\s+/,$_,2); if ($h ne $ph) { $k = $f; } else { unlink($f); link($k, $f); }'
Show Sample Output
I'm working in a group project currently and annoyed at the lack of output by my teammates. Wanting hard metrics of how awesome I am and how awesome they aren't, I wrote this command up. It will print a full repository listing of all files, remove the directories which confuse blame, run svn blame on each individual file, and tally the resulting line counts. It seems quite slow, depending on your repository location, because blame must hit the server for each individual file. You can remove the -R on the first part to print out the tallies for just the current directory. Show Sample Output
This is a very simple and lightweight way to play DI.FM stations
For a more complete version of the command with proper strings in the menu, try: (couldnt fit in the command field above)
zenity --list --width 500 --height 500 --title 'DI.FM' --text 'Pick a Radio' --column 'radio' --column 'url' --print-column 2 $(curl -s http://www.di.fm/ | awk -F '"' '/href="http:.*\.pls.*96k/ {print $2}' | sort | awk -F '/|\.' '{print $(NF-1) " " $0}') | xargs mplayer
This command line parses the html returned from http://di.fm and display all radio stations in a nice graphical menu. After the radio is chosen, the url is passed to mplayer so the music can start
dependencies:
- x11 with gtk environment
- zenity: simple app for displaying gtk menus (sudo apt-get install zenity on ubuntu)
- mplayer: simple audio player (sudo apt-get install mplayer on ubuntu)
Show Sample Output
echo "http%3A%2F%2Fwww.google.com" | sed -e's/%\([0-9A-F][0-9A-F]\)/\\\\\x\1/g' | xargs echo -e
http://www.google.com
Works under bash on linux. just alter the '-e' option to its corresponding equivalence in your system to execute escape characters correctly.
checks which files are not under version control, fetches the names and runs them through "svn add". WARNING: doesn't work with white spaces.
Using xargs is better than:
find /path/to/dir -type f -exec rm \-f {} \;
as the -exec switch uses a separate process for each remove. xargs splits the streamed files into more managable subsets so less processes are required.
Create a tgz archive of all the files containing local changes relative to a subversion repository.
Add the '-q' option to only include files under version control:
svn st -q | cut -c 8- | sed 's/^/\"/;s/$/\"/' | xargs tar -czvf ../backup.tgz
Useful if you are not able to commit yet but want to create a quick backup of your work. Of course if you find yourself needing this it's probably a sign you should be using a branch, patches or distributed version control (git, mercurial, etc..)
This helped me find a botnet that had made into my system. Of course, this is not a foolproof or guarantied way to find all of them or even most of them. But it helped me find it.
no loop, only one call of grep, scrollable ("less is more", more or less...)
xargs can be used in this manner to download multiple files at a time, and xargs will in this case run 10 processes at a time and initiate a new one when the number running falls below 10. Show Sample Output
Search for files and list the 20 largest.
find . -type f
gives us a list of file, recursively, starting from here (.)
-print0 | xargs -0 du -h
separate the names of files with NULL characters, so we're not confused by spaces
then xargs run the du command to find their size (in human-readable form -- 64M not 64123456)
| sort -hr
use sort to arrange the list in size order. sort -h knows that 1M is bigger than 9K
| head -20
finally only select the top twenty out of the list
Show Sample Output
This command will find the biggest files recursively under a certain directory, no matter if they are too many. If you try the regular commands ("find -type f -exec ls -laSr {} +" or "find -type f -print0 | xargs -0 ls -laSr") the sorting won't be correct because of command line arguments limit. This command won't use command line arguments to sort the files and will display the sorted list correctly. Show Sample Output
change the *.avi to whatever you want to match, you can remove it altogether if you want to check all files.
I needed a way to search all files in a web directory that contained a certain string, and replace that string with another string. In the example, I am searching for "askapache" and replacing that string with "htaccess". I wanted this to happen as a cron job, and it was important that this happened as fast as possible while at the same time not hogging the CPU since the machine is a server. So this script uses the nice command to run the sh shell with the command, which makes the whole thing run with priority 19, meaning it won't hog CPU processing. And the -P5 option to the xargs command means it will run 5 separate grep and sed processes simultaneously, so this is much much faster than running a single grep or sed. You may want to do -P0 which is unlimited if you aren't worried about too many processes or if you don't have to deal with process killers in the bg. Also, the -m1 command to grep means stop grepping this file for matches after the first match, which also saves time. Show Sample Output
This command is useful when you want to know what process is responsible for a certain GUI application and what command you need to issue to launch it in terminal. Show Sample Output
Very useful set of commands to know when your file system was created. Show Sample Output
Run this in the directory you store your music in. mp3gain and vorbisgain applies the ReplayGain normalization routine to mp3 and ogg files (respectively) in a reversible way. ReplayGain uses psychoacoustic analysis to make all files sound about the same loudness, so you don't get knocked out of your chair by loud songs after cranking up the volume on quieter ones.
1. find file greater than 10 MB 2. direct it to xargs 3. xargs pass them as argument to ls Show Sample Output
Adapted using your usefull comments !
Maybe simpler, but again, don't know how it will work with space in filename.
commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.
Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.
» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10
Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):
Subscribe to the feed for: