List all duplicate directories

find . -type d| while read i; do echo $(ls -1 "$i"|wc -m) $(du -s "$i"); done|sort -s -n -k1,1 -k2,2 |awk -F'[ \t]+' '{ idx=$1$2; if (array[idx] == 1) {print} else if (array[idx]) {print array[idx]; print; array[idx]=1} else {array[idx]=$0}}'

Very quick! Based only on the content sizes and the character counts of filenames. If both numbers are equal then two (or more) directories seem to be most likely identical. if in doubt apply: diff -rq path_to_dir1 path_to_dir2 AWK function taken from here: http://stackoverflow.com/questions/2912224/find-duplicates-lines-based-on-some-delimited-fileds-on-line

Sample Output

215 1320 ./hwebcam048.kopija
215 1320 ./hwebcam048
24 16 ./ac3dlx/lib/tk8.5/ttk/CVS
24 16 ./ac3dlx/lib/tk8.4/CVS
24 16 ./ac3dlx/tcl/CVS

By: knoppix5

2014-02-25 22:50:09

awk du echo find ls read sort wc diff awk directories

Submit An Alternative

What Others Think

*flatcap shrieks in terror After looking at the command for five minutes, I ran it on my work directory. There are no duplicate dirs in my work area, but your command says otherwise. Most of the false dupes were .git dirs. Problem 1: counting the number of chars in filenames (in the root) is a very poor measure of "sameness". My git repos all look the same and so do many of my test dirs (example.c Makefile). Risk of false positives (high) Problem 2: "du -s" is bad in both directions. It will match different dirs of the same size AND it WON'T match identical dirs in some circumstances. Create a directory with 1000 files in, then delete 999 of them. Now copy that directory. Most likely "du -s dir_one dir_two" will show different sizes. Risk of false positives (high). Risk of false negatives (medium). Problem 3: awk uses $1$2 as its index. This means that $1=1 $2=23 will match $1=12 $2=3. This is unlikely to be seen in sorted numbers, but it is possible. Risk of false positive (very low). Now to the commands. The sort command can be simplifed: sort -sn -k1,2 You're echoing the results of ls and du, therefore the numbers will be space-delimited. So: awk -F space is OK awk -F' ' I mentioned the index being risky, so this is safer: { idx=$1"."$2; A simple dot to separate the two numbers. awk doesn't care about the type of idx. Now the rest of the awk program. It looks like it was copied from the awk one-liner to "uniq" the input. It's storing every line in array, when it only needs to keep the previous one (the input is sorted).

{ new=$1"."$2; if (new == old) { if (oldline) { print oldline; oldline = ""; } print; } else { old = new; oldline = $0; } }

This does the same thing, but using less memory. new = numbers from current line old = number from previous line (empty to start) oldline = copy of previous entire line (in case the new one matches) To save space :-) it can be condensed to: {n=$1"."$2;if(n==o){if(l){print l;l="";}print;}else{o=n;l=$0;}} Leaving the new command:

find . -type d|while read i; do echo $(ls -1 "$i"|wc -m) $(du -s "$i"); done|sort -sn -k1,2^Cwk -F' ' '{n=$1"."$2;if(n==o){if(l){print l;l="";}print;}else{o=n;l=$0;}}'

Enjoy :-)

flatcap · 529 weeks and 1 day ago

Thank you flatcap for your constructive critics. I saw lacking of any command regarding search for directory dupes. Above command can give a vague starting point in estimating candidates for more accurate test with: diff -rq path_to_dir1 path_to_dir2 if disk space is going to be exhausted (=some hard links being considered).

knoppix5 · 529 weeks and 1 day ago

I had a good think about how I'd solve the problem. But I didn't come up with any reliable solutions that didn't involve a lot of processing power. I'll keep thinking...

flatcap · 529 weeks and 1 day ago

Any fs property which can be expressed digitally will do. What about time ...echo $(ls -1 "$i"|wc -m)...|wc -l 159 real 0m1.104s user 0m0.088s sys 0m0.112s time ...echo $(ls "$i"| tee >(egrep -c a)...|wc -l 109 real 0m2.105s user 0m0.060s sys 0m0.144s 109 lines output vs 159, risk of false positive lower(?).

knoppix5 · 529 weeks ago

actually echo $(ls "$i"| tee >(egrep -c a) >(egrep -c e)|tail -2|tr -d '\n')

knoppix5 · 529 weeks ago

where to get shrooms dried mushrooms african pyramid mushroom african transkei mushroom albino penis envy b+ mushrooms blue meanies mushroom brazilian magic mushroom burmese mushroom golden teacher mushroom ko samui super strain malabar coast martinique mexican mushroom mushroom british columbia super thai mushroom nepalese chitwan mushroom penis envy texas penis envy 6 texas yellow caps vietnamese white jedi mind fuck tri coloured ecuadorian mushroom magic mushsroom edibles mexican cubensis psychedelic chocolate bar where to get shrooms dried mushrooms african pyramid mushroom african transkei mushroom albino penis envy b+ mushrooms blue meanies mushroom brazilian magic mushroom burmese mushroom golden teacher mushroom ko samui super strain malabar coast martinique mexican mushroom mushroom british columbia super thai mushroom nepalese chitwan mushroom penis envy texas penis envy 6 texas yellow caps vietnamese white jedi mind fuck tri coloured ecuadorian mushroom magic mushsroom edibles mexican cubensis psychedelic chocolate bar midnight mint psychedelic chocolate bar origin mango psilo gummies 0.25g-1g origin strawberry psilo gummies 0.25g-1g penis envy psychedelic chocolate bar psilocybin cube gummies vegan dark chocolate psychedelic chocolate bar microdoses brain capsules 50mg-200mg chill capsules 50mg-200mg clarity capsules 50mg-200mg health capsules 50mg-200mg love capsules 50mg-200mg mood capsules 50mg-200mg pure capsules 50mg-200mg social capsules 30 pack 350mg social capsules 5 pack 350mg buy weed online without medical card marijuana flowers types of concentrates cannabis cartridges buy edibles online legal marijuana shatter cannabis wax marijuana hash buy psychedelic online tinctures for anxiety cannabis seeds bank weed smoker accessories birthday cake kush acdc strain ace killer og strain ak 47 strain alaskan thunder fuck alien kush strain ambrosia kush strain atomik moon rocks cbd 40.57% atomik moon rocks diamond 58.77% banana og strain bc big bud strain birthday cake kush blue cheese strain blue dream og kush blue moonrock kush blue rhino strain blueberry kush strain bruce banner strain cannabis trim cherry pie strain cannatonic strain citrix strain dark star kush durban poison strain eleven roses strain g13 haze strain gelato strain girl scout cookies flavors girl scout cookies online granddaddy purple grape nana strain grape pie strain green crack strain guava kush strain hawaiian skunk jack herer strain mendo breath weed mighty mango mr nice guy weed blue mystic strain northern lights strain northern lights moon rock platinum og kush gas og strain pink og kush platinum hawaiian strain pootie tang strain purple mimosa strain purple white widow strain smart buds sour diesel weed strain sour moonrock strain space monkey tins strawberry banana weed sun rocks super citrus haze strain super silver haze wedding cake strain exotic white widows strain zombie kush strain hybrid weed indica strains sativa strains moon rock for sale cookies sfhaightst alien labs garrison lane gashouse grandiflora insane jokes up lemonade parlay runtz alien labs biskante 3.5g alien labs kryptochronic preroll alien labs kryptochronic smalls 7.0g alien labs melonade x sherbacio livesauce 1.0g alien labs xeno 3.5g alien labs xeno 5 pack prerolls alien labs baklava preroll berry pie cannatique sweet tartz 3.5g cbd flower 8th bag carrot cake cbd flower 8th bag lemon drop cbd flower 8th bag monk fruit cookies cream cookies gary payton 20 3.5g cookies kings road 3.5g cookies london poundcake 3.5g cookies sunset sherbert 3.5g cookies sweet tea 3.5g fiore pomelo anderson 3.5g fiore cherry cream pie 3.5g fiore guava berry 3.5g galactic gas london chello georgia pie juiced suds 3.5g snow man summer lane strain grandi melon hectane strain pineapple piss lemonade project 4516 ya hemi cookies/a> yadada strain insane og strain insane mac eleven strain insane pound cake strain insane rockstar strain lucky charmz baby kiwi chiclet lemonade lemon cookies strain lemonnade lychee lemonnade medellin cookies lemonnade sorbet lemonnade tangeray lions mane strain ooh la la indoor pineapple piss lemonade yellow fruit stripes lemonade jetlato strain mia runtz potato runtz runtz rock devine runt hawaiian runtz live resin runtz og obama runtz white runtz hawaiian runtz moneybagg runtz pink runtz mia runtz potato runtz blue m&m strain cherry cherry lemonade cherry pie og x jack the ripper cosmic cookies strain cutton candy kush mint midnight cannabis oreo cookies rasberry lemonade sunset strain white chocolate trufflez strains wonderbrett black truffle strain dope house trufflez london truffle strain dessert truffle weed pirate truffles weed strain purple truffles strain snow white truffle strain truffle cake strain truffle pie strain beyond blueberry black orchid grapes of warth lemon oz kush melon og orange banana oz kush pineapple og pink picasso strawberry bliss wonderbrett chomp banana pound cake marujuana apple fritter strain banana lato marijuana forbidden fruit strain got bars og by synergy kiwi strawberry strain papaya marijuana strain fruity pebbles marijuana golden state banana peanut butter mochi super og stiiizy mochilato liiit 3.5g

johnson20007 · 115 weeks and 2 days ago

seofox · 115 weeks and 1 day ago

seofox · 114 weeks and 5 days ago

seofox · 114 weeks and 2 days ago

Amazing! This blog looks just like my old one! It's on a completely different subject but it has pretty much the same layout and design. Wonderful choice of colors! Pug Puppies for Sale Near Me PUG PUPPY FOR SALE NEAR ME PUG PUPPIES FOR SALE pug puppies for sale in kentucky Pug Puppies for Sale Under $500 Near Me pug puppies for sale in texas pugs puppies for sale teacup pugs for sale pug puppies for sale by owner pug puppies ohio

rahimhh21 · 86 weeks and 6 days ago

Perfecthomepugs · 77 weeks ago

tewakdeui · 77 weeks ago

My friend mentioned to me your blog, so I thought I’d read it for myself. Very interesting insights, will be back for more! Fundrise Review