List all duplicate directories

find . -type d| while read i; do echo $(ls -1 "$i"|wc -m) $(du -s "$i"); done|sort -s -n -k1,1 -k2,2 |awk -F'[ \t]+' '{ idx=$1$2; if (array[idx] == 1) {print} else if (array[idx]) {print array[idx]; print; array[idx]=1} else {array[idx]=$0}}'

Very quick! Based only on the content sizes and the character counts of filenames. If both numbers are equal then two (or more) directories seem to be most likely identical. if in doubt apply: diff -rq path_to_dir1 path_to_dir2 AWK function taken from here: http://stackoverflow.com/questions/2912224/find-duplicates-lines-based-on-some-delimited-fileds-on-line

Sample Output

215 1320 ./hwebcam048.kopija
215 1320 ./hwebcam048
24 16 ./ac3dlx/lib/tk8.5/ttk/CVS
24 16 ./ac3dlx/lib/tk8.4/CVS
24 16 ./ac3dlx/tcl/CVS

By: knoppix5

2014-02-25 22:50:09

awk du echo find ls read sort wc diff awk directories

Submit An Alternative

What Others Think

*flatcap shrieks in terror After looking at the command for five minutes, I ran it on my work directory. There are no duplicate dirs in my work area, but your command says otherwise. Most of the false dupes were .git dirs. Problem 1: counting the number of chars in filenames (in the root) is a very poor measure of "sameness". My git repos all look the same and so do many of my test dirs (example.c Makefile). Risk of false positives (high) Problem 2: "du -s" is bad in both directions. It will match different dirs of the same size AND it WON'T match identical dirs in some circumstances. Create a directory with 1000 files in, then delete 999 of them. Now copy that directory. Most likely "du -s dir_one dir_two" will show different sizes. Risk of false positives (high). Risk of false negatives (medium). Problem 3: awk uses $1$2 as its index. This means that $1=1 $2=23 will match $1=12 $2=3. This is unlikely to be seen in sorted numbers, but it is possible. Risk of false positive (very low). Now to the commands. The sort command can be simplifed: sort -sn -k1,2 You're echoing the results of ls and du, therefore the numbers will be space-delimited. So: awk -F space is OK awk -F' ' I mentioned the index being risky, so this is safer: { idx=$1"."$2; A simple dot to separate the two numbers. awk doesn't care about the type of idx. Now the rest of the awk program. It looks like it was copied from the awk one-liner to "uniq" the input. It's storing every line in array, when it only needs to keep the previous one (the input is sorted).

{ new=$1"."$2; if (new == old) { if (oldline) { print oldline; oldline = ""; } print; } else { old = new; oldline = $0; } }

This does the same thing, but using less memory. new = numbers from current line old = number from previous line (empty to start) oldline = copy of previous entire line (in case the new one matches) To save space :-) it can be condensed to: {n=$1"."$2;if(n==o){if(l){print l;l="";}print;}else{o=n;l=$0;}} Leaving the new command:

find . -type d|while read i; do echo $(ls -1 "$i"|wc -m) $(du -s "$i"); done|sort -sn -k1,2^Cwk -F' ' '{n=$1"."$2;if(n==o){if(l){print l;l="";}print;}else{o=n;l=$0;}}'

Enjoy :-)

flatcap · 530 weeks and 2 days ago

Thank you flatcap for your constructive critics. I saw lacking of any command regarding search for directory dupes. Above command can give a vague starting point in estimating candidates for more accurate test with: diff -rq path_to_dir1 path_to_dir2 if disk space is going to be exhausted (=some hard links being considered).

knoppix5 · 530 weeks and 2 days ago

I had a good think about how I'd solve the problem. But I didn't come up with any reliable solutions that didn't involve a lot of processing power. I'll keep thinking...

flatcap · 530 weeks and 2 days ago

Any fs property which can be expressed digitally will do. What about time ...echo $(ls -1 "$i"|wc -m)...|wc -l 159 real 0m1.104s user 0m0.088s sys 0m0.112s time ...echo $(ls "$i"| tee >(egrep -c a)...|wc -l 109 real 0m2.105s user 0m0.060s sys 0m0.144s 109 lines output vs 159, risk of false positive lower(?).

knoppix5 · 530 weeks and 2 days ago

actually echo $(ls "$i"| tee >(egrep -c a) >(egrep -c e)|tail -2|tr -d '\n')

knoppix5 · 530 weeks and 2 days ago

where to get shrooms dried mushrooms african pyramid mushroom african transkei mushroom albino penis envy b+ mushrooms blue meanies mushroom brazilian magic mushroom burmese mushroom golden teacher mushroom ko samui super strain malabar coast martinique mexican mushroom mushroom british columbia super thai mushroom nepalese chitwan mushroom penis envy texas penis envy 6 texas yellow caps vietnamese white jedi mind fuck tri coloured ecuadorian mushroom magic mushsroom edibles mexican cubensis psychedelic chocolate bar where to get shrooms dried mushrooms african pyramid mushroom african transkei mushroom albino penis envy b+ mushrooms blue meanies mushroom brazilian magic mushroom burmese mushroom golden teacher mushroom ko samui super strain malabar coast martinique mexican mushroom mushroom british columbia super thai mushroom nepalese chitwan mushroom penis envy texas penis envy 6 texas yellow caps vietnamese white jedi mind fuck tri coloured ecuadorian mushroom magic mushsroom edibles mexican cubensis psychedelic chocolate bar midnight mint psychedelic chocolate bar origin mango psilo gummies 0.25g-1g origin strawberry psilo gummies 0.25g-1g penis envy psychedelic chocolate bar psilocybin cube gummies vegan dark chocolate psychedelic chocolate bar microdoses brain capsules 50mg-200mg chill capsules 50mg-200mg clarity capsules 50mg-200mg health capsules 50mg-200mg love capsules 50mg-200mg mood capsules 50mg-200mg pure capsules 50mg-200mg social capsules 30 pack 350mg social capsules 5 pack 350mg buy weed online without medical card marijuana flowers types of concentrates cannabis cartridges buy edibles online legal marijuana shatter cannabis wax marijuana hash buy psychedelic online tinctures for anxiety cannabis seeds bank weed smoker accessories birthday cake kush acdc strain ace killer og strain ak 47 strain alaskan thunder fuck alien kush strain ambrosia kush strain atomik moon rocks cbd 40.57% atomik moon rocks diamond 58.77% banana og strain bc big bud strain birthday cake kush blue cheese strain blue dream og kush blue moonrock kush blue rhino strain blueberry kush strain bruce banner strain cannabis trim cherry pie strain cannatonic strain citrix strain dark star kush durban poison strain eleven roses strain g13 haze strain gelato strain girl scout cookies flavors girl scout cookies online granddaddy purple grape nana strain grape pie strain green crack strain guava kush strain hawaiian skunk jack herer strain mendo breath weed mighty mango mr nice guy weed blue mystic strain northern lights strain northern lights moon rock platinum og kush gas og strain pink og kush platinum hawaiian strain pootie tang strain purple mimosa strain purple white widow strain smart buds sour diesel weed strain sour moonrock strain space monkey tins strawberry banana weed sun rocks super citrus haze strain super silver haze wedding cake strain exotic white widows strain zombie kush strain hybrid weed indica strains sativa strains moon rock for sale cookies sfhaightst alien labs garrison lane gashouse grandiflora insane jokes up lemonade parlay runtz alien labs biskante 3.5g alien labs kryptochronic preroll alien labs kryptochronic smalls 7.0g alien labs melonade x sherbacio livesauce 1.0g alien labs xeno 3.5g alien labs xeno 5 pack prerolls alien labs baklava preroll berry pie cannatique sweet tartz 3.5g cbd flower 8th bag carrot cake cbd flower 8th bag lemon drop cbd flower 8th bag monk fruit cookies cream cookies gary payton 20 3.5g cookies kings road 3.5g cookies london poundcake 3.5g cookies sunset sherbert 3.5g cookies sweet tea 3.5g fiore pomelo anderson 3.5g fiore cherry cream pie 3.5g fiore guava berry 3.5g galactic gas london chello georgia pie juiced suds 3.5g snow man summer lane strain grandi melon hectane strain pineapple piss lemonade project 4516 ya hemi cookies/a> yadada strain insane og strain insane mac eleven strain insane pound cake strain insane rockstar strain lucky charmz baby kiwi chiclet lemonade lemon cookies strain lemonnade lychee lemonnade medellin cookies lemonnade sorbet lemonnade tangeray lions mane strain ooh la la indoor pineapple piss lemonade yellow fruit stripes lemonade jetlato strain mia runtz potato runtz runtz rock devine runt hawaiian runtz live resin runtz og obama runtz white runtz hawaiian runtz moneybagg runtz pink runtz mia runtz potato runtz blue m&m strain cherry cherry lemonade cherry pie og x jack the ripper cosmic cookies strain cutton candy kush mint midnight cannabis oreo cookies rasberry lemonade sunset strain white chocolate trufflez strains wonderbrett black truffle strain dope house trufflez london truffle strain dessert truffle weed pirate truffles weed strain purple truffles strain snow white truffle strain truffle cake strain truffle pie strain beyond blueberry black orchid grapes of warth lemon oz kush melon og orange banana oz kush pineapple og pink picasso strawberry bliss wonderbrett chomp banana pound cake marujuana apple fritter strain banana lato marijuana forbidden fruit strain got bars og by synergy kiwi strawberry strain papaya marijuana strain fruity pebbles marijuana golden state banana peanut butter mochi super og stiiizy mochilato liiit 3.5g

johnson20007 · 116 weeks and 3 days ago

seofox · 116 weeks and 2 days ago

seofox · 116 weeks ago

seofox · 115 weeks and 4 days ago

Amazing! This blog looks just like my old one! It's on a completely different subject but it has pretty much the same layout and design. Wonderful choice of colors! Pug Puppies for Sale Near Me PUG PUPPY FOR SALE NEAR ME PUG PUPPIES FOR SALE pug puppies for sale in kentucky Pug Puppies for Sale Under $500 Near Me pug puppies for sale in texas pugs puppies for sale teacup pugs for sale pug puppies for sale by owner pug puppies ohio

rahimhh21 · 88 weeks ago

Perfecthomepugs · 78 weeks and 2 days ago

tewakdeui · 78 weeks and 1 day ago

My friend mentioned to me your blog, so I thought I’d read it for myself. Very interesting insights, will be back for more! Fundrise Review

seomind · 70 weeks ago

I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work. Fundrise Review

seomind · 70 weeks ago

Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. Adam Cherrington Review

seomind · 70 weeks ago

I am a new user of this site so here i saw multiple articles and posts posted by this site,I curious more interest in some of them hope you will give more information on this topics in your next articles. Matthew Tang Review

seomind · 70 weeks ago

My friend mentioned to me your blog, so I thought I’d read it for myself. Very interesting insights, will be back for more! Matthew Tang Review

seomind · 70 weeks ago

Interesting post. I Have Been wondering about this issue, so thanks for posting. Pretty cool post.It 's really very nice and Useful post.Thanks Matthew Tang Review

seomind · 70 weeks ago

it's really cool blog. Linking is very useful thing.you have really helped Adam Cherrington Review

seomind · 70 weeks ago

I really loved reading your blog. It was very well authored and easy to understand. Unlike other blogs I have read which are really not that good.Thanks alot! Zendrop Review

seomind · 70 weeks ago

You know your projects stand out of the herd. There is something special about them. It seems to me all of them are really brilliant! Zendrop Review

seomind · 70 weeks ago

I have been searching to find a comfort or effective procedure to complete this process and I think this is the most suitable way to do it effectively. Zendrop Review

seomind · 70 weeks ago

donyaheidarzade · 49 weeks and 3 days ago

pugpuppies95 · 40 weeks and 2 days ago

liltommy · 40 weeks and 1 day ago

donperi1 · 32 weeks and 3 days ago

What do you think?

Any thoughts on this command? Does it work on your machine? Can you do the same thing with only 14 characters?

You must be signed in to comment.

What's this?

commandlinefu.com is the place to record those command-line gems that you return to again and again. That way others can gain from your CLI wisdom and you from theirs too. All commands can be commented on, discussed and voted up or down.

Share Your Commands

Similar Commands

List complete size of directories (do not consider hidden directories)

List recursively current directory files/directories in vim

Compare directories (using cmp to compare files byte by byte) to find files of the same name that differ

Recursively remove all empty directories

Stay in the loop…

Follow the Tweets.

Every new command is wrapped in a tweet and posted to Twitter. Following the stream is a great way of staying abreast of the latest commands. For the more discerning, there are Twitter accounts for commands that get a minimum of 3 and 10 votes - that way only the great commands get tweeted.

» http://twitter.com/commandlinefu
» http://twitter.com/commandlinefu3
» http://twitter.com/commandlinefu10

Subscribe to the feeds.

Use your favourite RSS aggregator to stay in touch with the latest commands. There are feeds mirroring the 3 Twitter streams as well as for virtually every other subset (users, tags, functions,…):

Subscribe to the feed for:

» all commands
» commands with 3 up-votes (commandlinefu3)
» commands with 10 up-votes (commandlinefu10)